My understanding is that 8GB is plenty enough to play with small models (7B or so) in a lightly quantized version. (Especially since model params are kept on disk and only really "mapped" to RAM, with it acting as a cache.) 4GB is where it gets dicey, as you're constrained to tiny models that just don't do anything very interesting.
Thing is, you do need all the params usually, so if the model is only partially mapped to RAM, it's the equivalent of an app swapping in and out as it runs. Which is to say, it means that inference is much slower.
Local Apple models are likely in the 2-3B range, but fine-tuned to specific tasks.