The trade off between utilization and latency is rarely understood in organizations. Little’s law should be mandatory (management) reading. Unused capacity is not waste, but buffers that absorb variability, and thus keeps latency down.
It reminds me of Kingman's Formula in queueing theory: As server utilization approaches 100%, the wait time approaches infinity.
We intuitively understand this for servers (you never run a CPU at 99% if you want responsiveness), yet for some reason, we decided that a human brain—which is infinitely more complex—should run at 99% capacity and still be expected to handle urgent interruptions without crashing.
Who is responsible for the terrible decision? In the pro vs con analysis, saving 20% size occasionally vs updating ALL pdf libraries/apps/viewers ever built SHOULD be a no-brainer.
The one-time purchase version of Microsoft Office is not available worldwide. Where offered, it is reduced to Word, Excel, PowerPoint, and OneNote, with Outlook as a Business edition extra. Individual apps can sometimes be bought separately, but pricing usually makes this impractical. This is to push buyers to Microsoft 365 subscriptions which is the primary product.
Upgrade regret here. What used to be solid performance is now random hangs and unresponsiveness. Most things work but it’s Apples least polished OS in many years.
After recently applying Codex to a gigantic old and hairy project that is as far from greenfield it can be, I can assure you this assertion is false. It’s bonkers seeing 5.2 churn though the complexity and understanding dependencies that would take me days or weeks to wrap my head around.
Note: At the point of writing this, the comments are largely skeptical.
Reading this as an avid Codex CLI user, some things make sense and reflect lessons learned along the way. However, the patterns also get stale fast as agents improve and may be counterproductive. One such pattern is context anxiety, which probably reflects a particular model more than a general problem, and is likely an issue that will go away over time.
There are certainly patterns that need to be learned, and relearned over time. Learning the patterns is sort of an anti-pattern, since it is the model that should be trained to alleviate its shortcomings rather than the human. Then again, a successful mindset over the last three years has been to treat models as another form of intelligence, not as human intelligence, by getting to know them and being mindful of their strengths and weaknesses. This is quite a demanding task in terms of communication, reflection, and perspective-taking, and it is understandable that this knowledge is being documented.
But models change over time. The strengths and weaknesses of yesterday’s models are not the same as today’s, and reasoning models have actually removed some capabilities. A simple example is giving a reasoning model with tools the task of inspecting logs. It will most likely grep and parse out smaller sections, and may also refuse an instruction to load the file into context to inspect it. The model then relies on its reasoning (system 2) rather than its intuitive (system 1) thinking.
This means that many of these patterns are temporary, and optimizing for them risks locking human behavior to quirks that may disappear or even reverse as models evolve. YMMV.
I have the theory that agents will improve a lot when trained on more recent training data. Like I‘ve had agents have context anxiety because they still think an average LLM context window is around 32k tokens. Also building agents with agents, letting them do prompt engineering etc, still is very unsatisfactory, they keep talking about GPT-3.5 or Gemini 1.5 and try to optimize the prompts for those old models, which of course was almost a totally different thing. So I‘m thinking if that‘s how they are thinking of themselves as well, maybe that artificially limits their agentic behavior too, because they just don’t know how much more capable they are than GPT-3.5
Because “strengths” of a model is based not on inherit characteristics, but on various user perception. It feels that model A is doing some thing better, same at it feels that your productive is high.
Strong point. I’m considering to tag patterns better and add stuff like “model/toolchain-specific,” and something like “last validated (month/year)” field. Things change fast and for example “Context anxiety” is likely less relevant and should be reframed that way (or retired).
Luckily Beelinks are still cheap, work decently, and can run Linux/Windows, so if all someone needs is to browse the Internet and do basic stuff, honestly? They’re fine. We’ll see how long that lasts though.
reply