Out of curiosity, do you have any theories of why it works so well at such aggre...

antirez · 2026-05-08T05:39:21 1778218761

It's a mix of extreme sparsity but with the routed expert doing a non trivial amount of work (and it is q8), and projections and routing not being quantized as well. Also the fact it's a QAT model must have a role I guess, and I quantized routed experts out layers with Q2 instead of IQ2_XXS to retain quality.

happyPersonR · 2026-05-08T16:09:13 1778256553

Not trying to give anyone homework thinking out loud :

One thing I would love to see is if this dogfoods itself

Like would dsv4 with q2 be able to do this task itself on this hardware ?

Sidenote: I wish I had a M4-m3 … thinking about getting a ASUS ROG Flow Z13 Gaming Laptop (Model GZ302EA-XS99) uses pcie 4.0 so disk might be a little slower, but I want to see how this does on like Vulcan :)