Yes. You’re complaining that Gemini “shits the bed”, despite using 2.5 Flash (not Pro), without search or reasoning.
It’s a fact that some models are smarter than others. This is a task that requires reasoning so the article is hard to take seriously when the author uses a model optimised for speed (not intelligence), and doesn’t even turn reasoning on (nor suggest they’re even aware of it being a feature).
I asked the exact prompt to ChatGPT 5 Thinking and got an excellent answer with cited sources, all of which appears to be accurate.
I just ran the same test on Gemini 2.5 pro (I assume it enables search by default, because it added a bunch of "sources") and got the exact same result as the author. It claims ".bdi" is the ccTLD for Burundi, which is false they have .bi[1]. It claims ".time" and ".article" are TLDs.
I think the authors point stands.
EDIT: I tried it with "Deep Research" too. Here it doesn't invent either TLDs or HTML Element, but the resulting list is incomplete.
I wonder if it works better if we ask the LLM to produce a script that extract the resulting list, and then we run the script on the two input lists.
There is also the question of the two input lists: it's not clear if it is better to ask the LLM to extract the two input lists directly, or again to ask the LLM to write a script that extract the two input lists from the raw text data.
In my experience reasoning and search come with their own set of tradeoffs. It works great when it works. But the variance can be wider than just an LLM.
Search and reasoning use up more context, leading to context rot, and subtler harder to detect hallucinations. Reasoning doesn’t always focus on evaluating the quality of evidence, just “problem solving” from some root set of axioms found in search.
I’ve had this happen in Claude code for example where it hallucinated a few details about a library based on what badly written forum post.
> Or, if LLMs are so smart, why doesn't it say "Hmmm, would you like to use a different model for this?"
That is indeed an area where LLMs don't shine.
That is, not only are they trained to always respond with an answer, they have no ability to accurately tell how confident they are in that answer. So you can't just filter out low confidence answers.
Something I think would be interesting for model APIs and consumer apps to exposed would be the probability of each individual token generated.
I’m presuming that one class of junk/low quality output is when the model doesn’t have high probability next tokens and works with whatever poor options it has.
Maybe low probability tokens that cross some threshold could have a visual treatment to give feedback the same way word processors give feedback in a spelling or grammatical error.
But maybe I’m making a mistake thinking that token probability is related to the accuracy of output?
Then criticize the providers on their defaults instead of claiming that they can't solve the problem?
> Or, if LLMs are so smart, why doesn't it say "Hmmm, would you like to use a different model for this?"
That's literally what ChatGPT did for me[0], which is consistent from what they shared at the last keynote (quick-low reasoning answer per default first, with reasoning/search only if explicitly prompted or as a follow-up). It did miss one match tough, as it somehow didn't parse the `<search>` element from the MDN docs.
You are pointing out a maturity issue, not a capability problem. It's clear to everyone that LLM products are immature, but saying they are incapable is misleading
“Defaults are shit” — is that really true though?! Just because it shits the bed on some tasks does not mean it is shit. For people integrating llms into any workflow that requires a modicum of precision or determinism, one must always evaluate output closely/have benchmarks. You must treat the llm as an incompetent but overconfident intern, and thus have fast mechanisms for measuring output and giving feedback.
The “full Tailwind experience” is already freely available. What “lost opportunities for deep integration” is a frontend CSS framework missing?
Tailwind Plus (the commercial product) is like buying an off-the-shelf template. It’s just a collection of themes and pre-built components — useful for devs who want to get started quickly on a project, but it’s cookie-cutter and can easily be replicated by anyone with Tailwind itself.
There are devs who think the currently available HTML elements are all we needed. But there are many more that believe we are missing primitives that Tailwind (and others) is attempting to solve for.
> It’s just a collection of themes and pre-built components
All reusable web components could be described as an optionally themed pre-built component. That's kind of the point.
I no longer see value in prebuilt templates since LLMs can put things together sufficiently well for prototyping. Even when using templates before you still needed to customise them. Feels like we are going through a transition period.
I’m not sure I follow, what exactly is your complaint? The Iterator interface is described as:
> Interface for external iterators or objects that can be iterated themselves internally
Note “external iterators or objects”. The Iterator interface is not exactly everyday PHP, it’s a specialist utility for making classes iterable so they can be accessed like arrays. Most developers will rarely use it directly, and it’s not being used in the parent comment’s example either.
Iterating over something requires knowing where you are in the sequence, so of course you would need to implement a method to get the current position of the iteration.
> Iterating over something requires knowing where you are in the sequence, so of course you would need to implement a method to get the current position of the iteration.
No you don't. Other languages don't require it. There is no issue to get a position outside of iterator and it's more generic approach.
Let me guess. To somehow patch iteration on associative arrays? And instead of bringing pairs or tuples as first class citizens it extends iterators with `key` and `value`. And now any Iterator implementation should track own sequentially increasing key. Very nice design indeed.
Great first article, and very interesting to see someone else using a receipt printer for bite-sized task management!
I have a variety of automations running which print actionable tasks to my receipt printer via a Raspberry Pi. It’s nice having a real-life ticket I can take hold of.
One thing to be aware of if you’re handling receipts frequently: make sure to buy phenol-free thermal paper. Phenol is toxic and some types of it are banned in certain countries.
Receipt printers don't use ink, instead they use thermal paper which darkens when heated. You can test this by scratching it with your nail, the heat is enough to leave a mark
I agree with you on the first part, but are you sure that the heat from the fingernail is what's leaving that mark? I can take a cold object and run it on the receipt paper to get the same effect, so I think that's a different mechanism at play but I'm open to being proven wrong.
The developers in the paper only require a small flash of local heat to turn black, which is why thermal printers can print so fast given the time it takes to heat up and cool down the print head. Friction produces enough heat to do that. You can test this by pressing an object down only, or running it very slowly across the surface in comparison.
Yes, you look at it carefully and if it looks like thermal paper it may be toxic.
If the substances used are known to be toxic is another matter but you won't know that even with a correct label because it takes time for us to find out that new substances are toxic.
I think this is the right approach, speaking as someone who went down the rabbit-hole of looking at alternative non-bisphenol or non-phenol image developers. The very little research on the new ones tend to conclude "we don't know if it's toxic in the long term" or in the case of urea-based papers, "it's highly toxic against aquatic life."
To the GP, if the goal is to avoid phenol papers, phenol papers tend to develop deeper black. And in the US, phenol-free papers are new enough the backside often advertises it. Some are very misleadingly labeled BPA-free, which usually means it's made with the very similar and likely equally toxic BPS.
That is true. I actually was ambiguous in my post, because I meant code that generates stuff, not that was generated by AI, even though I don't like the latter, either.
I find it offensive to have any generative AI code on my computer.
Settings → Apple Intelligence and Siri → toggle Apple Intelligence off.
It's not enabled by default. But in case you accidentally turned it on, turning it off gets you a bunch of disk space back as the AI stuff is removed from the OS.
Some people are just looking for a reason to be offended.
The theatrics of being *forced* to use completely optional, opt-in features has been a staple of discussions regarding Apple for years.
Every year, macOS and iPadOS look superficially more and more similar, but they remain distinct in their interfaces, features, etc. But the past 15 years have been "we'll be *forced* to only use Apple-vetted software, just like the App Store!"
And yeah, the Gatekeeper mechanism got less straight-forward to get around in macOS 15, but … I don't know, someone will shoot me down for this, but it's been a long 15 years to be an Apple user with all that noise going on around you from people who really don't have the first clue what they're talking about — and on HN, no less.
They can come back to me when what they say actually happens. Until then, fifteen dang years.
I think I know what you meant. You mean you don't want code that runs generative AI in your computer? But, what you wrote could also mean you don't want any code running that was generated by AI. Even with open source, your computer will be running code generated by AI as most open source projects are using it. I suspect it will be nearly impossible to avoid. Most open source projects will accept AI generated code as long as it's been reviewed.
Good point, and you were right. I was ambiguous. I meant a system that generates stuff, not stuff that was generated by AI. But I'd rather not use stuff that was generated by AI, either. But you are also right. That will become impossible, and probably already is. Not a very nice world, I think. Best thing to do then is to minimize it, and avoid computers as much as possible....
I didn't say "generating code", I meant I find it offensive to have any code sitting on my computer that generates code, whether I use it or not. I prefer minimalism: just have on my computer what I will use, and I have a limited data connection which means even more updates with useless code I won't use.
I thoroughly enjoyed reading this. I peaked at 137wpm[0] and I’m not ashamed to say I love typing long sentences. I type fast enough that my thoughts normally can’t keep up with my fingers, but when I plan a sentence out in my head, the satisfaction I get from watching my inner monologue transmogrify into words onscreen is palpable. It’s a rush. And I totally get it. Great article!
> If we […] tried to use these systems to solve the kinds of problems we need Einsteins, Hawkings and Taos for, then we would be in for one miserable disappointment after another
We can literally watch Terence Tao himself vibe coding formal proofs using Claude and o4. He doesn’t seem too disappointed.
He's the only person I know of who can actually get good results out of these systems (though I know several people who claim they can). What he's doing is fundamentally not the same thing as what most "vibe coders" are doing: take the autocomplete away, and he's still a talented mathematician.
Sure, but what he's doing is very much not using Claude or o4 to do things we need Terence Tao for.
I'm not saying today's AI systems aren't useful for anything. I'm not saying they aren't impressive. I'm just saying they're nowhere close to the "Einstein, Hawking and Tao in your house" hyperbole in the OP. I would be very, very surprised if Terence Tao disagreed with me about that.
Do you not think the AI output looks far more polished and print-ready? Canny edges have a lot of noise and don't look at all clean for coloring book purposes.
> yoUr'E PRoMPTiNg IT WRoNg!
> Am I though?”
Yes. You’re complaining that Gemini “shits the bed”, despite using 2.5 Flash (not Pro), without search or reasoning.
It’s a fact that some models are smarter than others. This is a task that requires reasoning so the article is hard to take seriously when the author uses a model optimised for speed (not intelligence), and doesn’t even turn reasoning on (nor suggest they’re even aware of it being a feature).
I asked the exact prompt to ChatGPT 5 Thinking and got an excellent answer with cited sources, all of which appears to be accurate.