> It is totally plausible but do we really think just in words?
I find that proposition totally implausible. Some people certainly report only thinking in words & having a continuous inner monologue, but I'm not one of them. I think, then I describe my thoughts in words if I'm speaking or writing or thinking about speaking or writing.
Also: I simply asked to the 7B model "hi, who are u":
Thought for 1 seconds
Okay, the user is asking "Who are u?" with a casual tone ("u" instead of "you", exclamation mark). Let me break this down.
First, I need to address the question itself. They want to know who I am as an AI assistant. My standard response is that I'm DeepSeek R1, designed to be helpful and handle reasoning step by step. But maybe they're not familiar with AI terms in general. The "u" part might just be their way of speaking, so I should reply politely using "you" form.
Next, considering the context. They started with an HTML tag " "
----
And it hung there without a reply. It might be an OpenWebUI bug but not so promising
7B:Hi! I'm Olmo 3, an AI assistant created by the non-profit organization Ai2. I'm here to help with questions, ideas, or tasks you have—just let me know what you need! How can I assist you today? Rawr!
32B:
Hi! I'm Olmo, a helpful AI assistant built by the Allen Institute for AI (Ai2). My knowledge is up to December 2024, and I'm designed to assist with a wide range of tasks. How can I help you today?
Every time a new model is released, there are a bunch of reports or written experiences about people using the model with software that seemingly doesn't support it.
GPT-OSS really made that clear, where 90% of the ecosystem declared it broken, but most people were using dumb quants and software that didn't properly support it. Guess we'll repeat the same thing with OLMo now.
I'm really glad to read this, as this was my experience in LM studio with olmo. Worked for the first message but got progressively more unstable. Also doesn't seem to reset model state for a new conversation, every response following the model load gets progressively worse, even in new chats.
There are a bunch (currently 3) of examples of people getting funny output, two of which saying it’s in LM studio (I don’t know what that is). It does seem likely that it’s somehow being misused here and the results aren’t representative.
Definitely. Usually I'd wait 2-3 weeks for the ecosystem to catch up and iron out the kinks, or do what I did for GPT-OSS, fix it in the places where it's broken, then judge it when I'm sure it's actually used correctly.
Otherwise, in that early period of time, only use the provided scripts/tools from the people releasing the model itself, which is probably the only way in those 2-3 weeks to be sure you're actually getting the expected responses.
Reminds me of an old joke where a guy is walking down the street and another person says “good morning”. The guy starts deconstructing what “good morning” means until he finally reaches the conclusion “that bastard was calling me an asshole”.
I think they should start aiming for 20B models along with 32B and 7B. Usually 7B is enough for a 8GB GPU, 32B requires a 24GB GPU for decent quants (I can fit a 32B with IQ3_XXS but is not ideal) while 20-ish B models (such as magistral or gpt-oss) are a perfect fit for 16GB GPUs
Depends heavily on the architecture too, I think a free-for-all to find the better sizes is still kind of ongoing, and rightly so. GPT-OSS-120B for example fits in around 61GB VRAM for me when on MXFP4.
Personally, I hope GPU makers instead start adding more VRAM, or if one can dream, expandable VRAM.
Well, GPUs are getting more VRAM, although it's pricey. But we didn't used to have 96GB VRAM GPUs at all, now they do exist :) But for the ones who can afford it, it is at least possible today. Slowly it increases.
Hehe me too…went all out on a MBP in 2022, did it again in April. Only upgrade I didn’t bother with was topping out at 128 GB of RAM instead of 64. Then, GPT-OSS 120B comes out and quickly makes me very sad I can’t use it locally
Same. I repeatedly kick myself for not getting the 128GB version, although not for the GPT-OSS model because I really haven’t been too impressed with it (through cloud providers). But now it’s best to wait until the M5 Max is out due to the new GPU neural accelerators that should greatly speed up prompt processing.
I was very hyped: maybe Google finally did something new, complete, unifying CLI and IDE, a sort of Claude Code Web but as an efficient, IDE-like, local thing.
I like this. I am imagining a companion extension for chrome/ff that uses you-get as a backend to implement it in a seamless way. Forward thinking idea: imagine going on youtube and have you-get extension bypass the youtube player and playing the content directly without ads. When I say youtube I might also say any other platform.
This is surely useful right now. I wonder what will happens to all the nice X11 tools once Wayland (hopefully soon) will be the golden standard. There are options to enable X11 behaviors in Wayland but I guess that is just a fallback to the insecure implementation.
### Added
- New `model_tokens.json` file containing token limits for various Ollama models.
- Dynamic token limit updating based on selected model in options.
- Automatic loading of model-specific token limits from `model_tokens.json`.
- Chunking and recursive summary for long pages
- Better handling of markdown returns
### Changed
- Updated `manifest.json` to include `model_tokens.json` as a web accessible resource.
- Modified `options.js` to handle dynamic token limit updates:
- Added `loadModelTokens()` function to fetch model token data.
- Added `updateTokenLimit()` function to update token limit based on selected model.
- Updated `restoreOptions()` function to incorporate dynamic token limit updating.
- Added event listener for model selection changes.
### Improved
- User experience in options page with automatic token limit updates.
- Flexibility in handling different models and their respective token limits.
### Fixed
- Potential issues with incorrect token limits for different models.
It almost seems like we got inspiration from our brain to build neural networks!
reply