Hacker Newsnew | past | comments | ask | show | jobs | submit | dools's commentslogin

The reasoning traces always look terrible and they’re frustrating to watch. It’s the same with Kimi. What’s interesting is that the end result is then good. I think it’s just some sort of devils advocate trick to get better output.

The reasoning tokens are really just there to extend the amount the LLM can "compute" the problem; put another way, the only way a given model can "think" more about a problem is to fill more of its context with predicted tokens, which has the effect of increasing the accuracy of each token. The reinforcement learning these models go through generally doesn't care what the chain of thought tokens look like (outside of preventing loops/gibberish/reward hacking), only how good the final answer is - so while it does look something like "reasoning" to us and has a rough correlation with the final answer, treating it as actually representative of what the final answer will be or an actual thought process is giving those tokens too much credit :)

For me what really drove this point home (that reasoning traces aren't "real" by any reasonable definition of the term) was noticing instances of things being out of order and exhibiting various inconsistencies with the final answer. My favorite was an example posted to HN that went something along the lines of the model first output the conclusion, then performed the supposed derivation after the fact, then stated it needed to verify the earlier conclusion to verify the derivation was correct so it hallucinated a tool call, then it remarked positively about the verification matching, and finally it output a slightly different answer. At no point was the answer actually correct although it was vaguely in the ballpark.

as compared to what though? you can't see the actual think traces for opus or gpt.

Compared to what comes out at the end. Like if you sit there watching Kimi k2.6 "think", you're like "what? no you fucking idiot!" and you get this urge to "steer" it and so on, but very rarely is that steering actually necessary, it just winds up popping out the correct answer and all of those 'Wait! That's it! I found it! Actually ... Let me just' is just whatever internal processing it needed to use to get to the correct response. Mostly likely it's just being self-adversarial and exploring a bunch of dumb avenues to isolate the best outcome with the highest probability

thinkslop recursion.

Is z.ai

Is 2 better than x.ai


The new Google ones don't look too bad.

If I need context for a session then that is output from a previous session, otherwise I find any “memory” functionality cumbersome.

I saw /graphify recently which cuts down on exploration cost and seems more appealing (although I haven’t tried it yet)


This was in my staff documentation and it’s now in my AGENTS.md: tell don’t ask.

If there is a decision that you need to make don’t ask me for input, do the thing that you think makes sense and then write down what you did and why.

If it’s the wrong thing I’ll update the docs to make it clear for next time.

Without this I would always wake up in the morning to an inbox full of questions and no work done, rather than an inbox full of finished tasks and maybe a couple of corrections.

With LLMs if I ask for a code analysis and plan to fix something they tend to put a list of questions at the end about which they want confirmation.

Then I have to waste time saying yes or no or coming up with the solution. If I tell them to instead just make assumptions and record them all at the end then I only need to correct 1 or 2 assumptions if required.


The idea of giving a non deterministic automated process direct deployment control is fucking madness to me. That’s why I don’t get the obsession with MCP. Deployment can be scripted. It doesn’t need an LLM, it is a completely deterministic process and you want it to run identically every single time.

The right model for agentic API usage is having LLMs write scripts that use APIs. Connecting agents to MCPs and telling them to go and do stuff over and over not only wastes money but invites catastrophe.


So it’s a twitter plugin?

You think the UK is in worse shape politically than the US?

Yes, by far.

Now that's a wild statement - UK at least has a leader who can say a coherent sentence in English, so far.

And hasn’t been found guilty of rape, hasn’t attempted a coup, hasn’t triggered the worst oil shock the world ever experienced for literally no reasons, etc

You have roving poorly trained gangs of jackboot federal thugs illegally imprisoning citizens in privately owned gulags and mridering protesters extra judicially. The president is a criminal and a rapist. Him and his entire staff are corrupt self dealing incompetent grifters. He’s put a fox TV host in charge of the army and podcasters in charge of the FBI.

The list goes on.

It’s a total replication of idiocracy. What right wing social media slop are you consuming that you think the US is in good shape?


Computers generally are stupid for schools. There should be a computer room and computer classes, but all other learning should happen offline. Computers are far too distracting.

Half of my classmates in university failed Compsci, they could not use a computer but they somehow could install Instagram, do basic video edits there and doomscroll. It is NOT conputers! Phones should be the main target.

But computer usage proficiency is not computer science either. These skills should be teached in a separate class. Agree with the gen z lack of computer skills though.

No it costs the same, the reason they do it is that it’s slightly more difficult to spoof a real number sender ID because most gateways will verify ownership by sending you a text on that number before letting you send outbound from it, where as they have no way of doing the same for an alphanumeric sender ID

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: