More

lostmsu · 2026-04-24T15:26:27 1777044387

I don't know what's behind a wall I'm sitting next to right now, but I'm reasonably sure there's a street. I'm also reasonably sure the comment about "you've been dead" is also a very accurate prediction.

goolz · 2026-04-24T17:04:41 1777050281

That wall is concrete and material. Death is not so much. I am reasonably sure you can do that with great accuracy while still having zero idea what lies in wait for us after we die. A false equivalence.

lostmsu · 2026-04-24T00:40:37 1776991237

https://pelicans.borg.games/

lostmsu · 2026-04-23T19:14:38 1776971678

I never saw that happen in Codex so there's a good chance that OpenClaw does something wrong. My main suspicion would be that it does not pass back thinking traces.

vintagedave · 2026-04-23T19:24:29 1776972269

Anecdata, but I see this in Codex all the time. It takes about two rounds before it realises it's supposed to continue.

dgunay · 2026-04-23T19:48:56 1776973736

I started seeing this a lot more with GPT 5.4. 5.3-codex is really good about patiently watching and waiting on external processes like CI, or managing other agents async. 5.4 keeps on yielding its turn to me for some reason even as it says stuff like "I'm continuing to watch and wait."

lostmsu · 2026-04-22T23:07:28 1776899248

H2 is already a fuel

lostmsu · 2026-04-22T13:25:28 1776864328

He meant lossy compression

lostmsu · 2026-04-21T17:02:11 1776790931

From a security-minded user perspective it makes sense to destroy keys when instead of a single entity I receive updates from I get another entity that is not equivalent, and half of my previous entity thinks that the other half is sus.

Avamander · 2026-04-21T17:11:48 1776791508

[flagged]

HybridStatAnim8 · 2026-04-21T18:47:44 1776797264

It wasnt intelligence agency compromise, it was a business partner compromise, who intended to violate the privacy and security of their users. Nothing about this is done out of spite. Im not sure where youre getting that from. You just seem to be attacking peoples character for making the right choice given the circumstances.

lostmsu · 2026-04-21T15:33:45 1776785625

Break it up

lostmsu · 2026-04-20T20:57:02 1776718622

> And they know what revolutions mean in Russia.

In retrospective they would mean saving 400k+ young men from dying, approx. 200k+ on each side. But Navalny wasn't a revolutioner (his mistake: if you have a death wish, there are more effective methods than peaceful protesting).

lostmsu · 2026-04-20T20:46:33 1776717993

You can also tell Claude/Codex/whatever to look up previous conversations in respective folders.

consumer451 · 2026-04-21T05:47:54 1776750474

Yes, I go even further. In-repo, I have a chats folder that my /done skill fills with ~"what we did, and didn't accomplish in this chat. Blah blah (a few more instructions) - finish with a great hand off to the next chat to continue the work." I run that anytime I approach 50% of the context window, as all models get dumb at that point. Then /clear, then /effort max just to be safe, then "please ingest chats/2026-01-01-00-00-what-we-did.md and proceed." It's a very purposeful custom /compress that works far better in my experience. If I ever hit auto-compress, I have failed as a Claude jockey.

lostmsu · 2026-04-20T20:33:41 1776717221

No you did not. You got 207 tok/s on an RTX 3090 with speculative decoding which, generally speaking, is not the same quality as serving the model without it.

Greedy-only decoding is even worse. There's a reason every public model comes with suggested sampling parameters. When you don't use them, output tends to degrade severely. In your case simply running a 14B model on the same hardware with the tools you compare against would probably be both faster and produce output of higher quality.

kingstnap · 2026-04-20T20:49:26 1776718166

Speculative decoding doesn't degrade output quality. The distribution it produces is exactly the same if you do it correctly. The original paper on it clearly talks about this. [0]

Speculative decoding is the same as speculative execution on CPUs. As long as you walk back on an incorrect prediction (i.e. the speculated tokens weren't accepted) then everything is mathematically exactly the same. It just uses more parallelism (specificslly higher arithmetic intensity).

[0] https://arxiv.org/abs/2211.17192

vessenes · 2026-04-20T20:41:39 1776717699

why is it that speculative decoding lowers quality? My understanding of it is that you use a small/distilled fast model to predict next token - when it doesn't match, you generate more. Checking against the large model is quick.

This should maintain exactly the quality of the original model, no?

ndriscoll · 2026-04-20T22:10:28 1776723028

AFAIU It's not that checking against the large model is quick (in the usual P!=NP sense that checking an answer is easier than finding one). It's that you can batch your checks. So you speculate the next 5 tokens, and then you can parallelize the large model running once for the batch of [...,n+1], [...,n+2], [...,n+3], [...,n+4], [...,n+5]. If you guessed right for a prefix, you turned a sequential problem (computing next token from current prefix) into a parallel one (doing multiple prefixes together) that the GPU likes. If you guessed wrong, you have to throw away the suffix starting at the wrong guess, and you wasted some extra energy computing.

lostmsu · 2026-04-20T21:07:43 1776719263

I looked up, and you are correct in regards to the specific algorithm used. In general there are approximate algorithms for speculative decoding.

Greedy decoding means it is still not ready though.

nodja · 2026-04-20T21:01:32 1776718892

> speculative decoding which, generally speaking, is not the same quality as serving the model without it.

I've never heard of ANY speculative decoding that wasn't lossless. If it was lossy it'd be called something else.

This page is just a port of DFLASH to gguf format, it only implements greedy decoding like you said so the outputs will be inferior, but not inferior to greedy decoding on the original model. Tho that's just a matter of implementing temperature, top_k, etc.