More

dools · 2026-06-25T00:18:50 1782346730

The reasoning traces always look terrible and they’re frustrating to watch. It’s the same with Kimi. What’s interesting is that the end result is then good. I think it’s just some sort of devils advocate trick to get better output.

rufo · 2026-06-25T03:11:34 1782357094

The reasoning tokens are really just there to extend the amount the LLM can "compute" the problem; put another way, the only way a given model can "think" more about a problem is to fill more of its context with predicted tokens, which has the effect of increasing the accuracy of each token. The reinforcement learning these models go through generally doesn't care what the chain of thought tokens look like (outside of preventing loops/gibberish/reward hacking), only how good the final answer is - so while it does look something like "reasoning" to us and has a rough correlation with the final answer, treating it as actually representative of what the final answer will be or an actual thought process is giving those tokens too much credit :)

fc417fc802 · 2026-06-25T04:16:55 1782361015

For me what really drove this point home (that reasoning traces aren't "real" by any reasonable definition of the term) was noticing instances of things being out of order and exhibiting various inconsistencies with the final answer. My favorite was an example posted to HN that went something along the lines of the model first output the conclusion, then performed the supposed derivation after the fact, then stated it needed to verify the earlier conclusion to verify the derivation was correct so it hallucinated a tool call, then it remarked positively about the verification matching, and finally it output a slightly different answer. At no point was the answer actually correct although it was vaguely in the ballpark.

teravor · 2026-06-25T03:31:56 1782358316

as compared to what though? you can't see the actual think traces for opus or gpt.

dools · 2026-06-25T04:37:28 1782362248

Compared to what comes out at the end. Like if you sit there watching Kimi k2.6 "think", you're like "what? no you fucking idiot!" and you get this urge to "steer" it and so on, but very rarely is that steering actually necessary, it just winds up popping out the correct answer and all of those 'Wait! That's it! I found it! Actually ... Let me just' is just whatever internal processing it needed to use to get to the correct response. Mostly likely it's just being self-adversarial and exploring a bunch of dumb avenues to isolate the best outcome with the highest probability

try-working · 2026-06-25T03:07:55 1782356875

thinkslop recursion.

dools · 2026-06-25T00:16:38 1782346598

Is z.ai

Is 2 better than x.ai

dools · 2026-06-22T22:37:34 1782167854

The new Google ones don't look too bad.

dools · 2026-06-21T22:10:43 1782079843

If I need context for a session then that is output from a previous session, otherwise I find any “memory” functionality cumbersome.

I saw /graphify recently which cuts down on exploration cost and seems more appealing (although I haven’t tried it yet)

dools · 2026-06-21T21:32:35 1782077555

This was in my staff documentation and it’s now in my AGENTS.md: tell don’t ask.

If there is a decision that you need to make don’t ask me for input, do the thing that you think makes sense and then write down what you did and why.

If it’s the wrong thing I’ll update the docs to make it clear for next time.

Without this I would always wake up in the morning to an inbox full of questions and no work done, rather than an inbox full of finished tasks and maybe a couple of corrections.

With LLMs if I ask for a code analysis and plan to fix something they tend to put a list of questions at the end about which they want confirmation.

Then I have to waste time saying yes or no or coming up with the solution. If I tell them to instead just make assumptions and record them all at the end then I only need to correct 1 or 2 assumptions if required.

dools · 2026-06-21T01:29:57 1782005397

The idea of giving a non deterministic automated process direct deployment control is fucking madness to me. That’s why I don’t get the obsession with MCP. Deployment can be scripted. It doesn’t need an LLM, it is a completely deterministic process and you want it to run identically every single time.

The right model for agentic API usage is having LLMs write scripts that use APIs. Connecting agents to MCPs and telling them to go and do stuff over and over not only wastes money but invites catastrophe.

dools · 2026-06-21T01:10:17 1782004217

So it’s a twitter plugin?

dools · 2026-06-20T21:33:34 1781991214

You think the UK is in worse shape politically than the US?

firebaze · 2026-06-20T21:36:13 1781991373

Yes, by far.

gambiting · 2026-06-20T21:44:43 1781991883

Now that's a wild statement - UK at least has a leader who can say a coherent sentence in English, so far.

dgellow · 2026-06-20T22:12:16 1781993536

And hasn’t been found guilty of rape, hasn’t attempted a coup, hasn’t triggered the worst oil shock the world ever experienced for literally no reasons, etc

dools · 2026-06-20T22:46:37 1781995597

You have roving poorly trained gangs of jackboot federal thugs illegally imprisoning citizens in privately owned gulags and mridering protesters extra judicially. The president is a criminal and a rapist. Him and his entire staff are corrupt self dealing incompetent grifters. He’s put a fox TV host in charge of the army and podcasters in charge of the FBI.

The list goes on.

It’s a total replication of idiocracy. What right wing social media slop are you consuming that you think the US is in good shape?

dools · 2026-06-20T00:58:36 1781917116

Computers generally are stupid for schools. There should be a computer room and computer classes, but all other learning should happen offline. Computers are far too distracting.

tacomagick · 2026-06-20T01:58:41 1781920721

Half of my classmates in university failed Compsci, they could not use a computer but they somehow could install Instagram, do basic video edits there and doomscroll. It is NOT conputers! Phones should be the main target.

Squarex · 2026-06-20T06:36:04 1781937364

But computer usage proficiency is not computer science either. These skills should be teached in a separate class. Agree with the gen z lack of computer skills though.

dools · 2026-06-18T08:00:52 1781769652

No it costs the same, the reason they do it is that it’s slightly more difficult to spoof a real number sender ID because most gateways will verify ownership by sending you a text on that number before letting you send outbound from it, where as they have no way of doing the same for an alphanumeric sender ID