Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My understanding is that 8GB is plenty enough to play with small models (7B or so) in a lightly quantized version. (Especially since model params are kept on disk and only really "mapped" to RAM, with it acting as a cache.) 4GB is where it gets dicey, as you're constrained to tiny models that just don't do anything very interesting.


Yes but if it is running system-wide services the issue is what is left over?

They should not be selling 8GB machines, it was always about being greedy with upgrades. Now they painted themselves in a corner.


Thing is, you do need all the params usually, so if the model is only partially mapped to RAM, it's the equivalent of an app swapping in and out as it runs. Which is to say, it means that inference is much slower.

Local Apple models are likely in the 2-3B range, but fine-tuned to specific tasks.


7B llama 3.1 takes up 5GB ram loaded up in LM Studio, Ive never seen macos idle below 5GB on its own but maybe it can pull some swap magic.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: