More

jw1224 · 2025-09-21T12:15:20 1758456920

> “To stave off some obvious comments:

> yoUr'E PRoMPTiNg IT WRoNg!

> Am I though?”

Yes. You’re complaining that Gemini “shits the bed”, despite using 2.5 Flash (not Pro), without search or reasoning.

It’s a fact that some models are smarter than others. This is a task that requires reasoning so the article is hard to take seriously when the author uses a model optimised for speed (not intelligence), and doesn’t even turn reasoning on (nor suggest they’re even aware of it being a feature).

I asked the exact prompt to ChatGPT 5 Thinking and got an excellent answer with cited sources, all of which appears to be accurate.

delusional · 2025-09-21T13:08:35 1758460115

I just ran the same test on Gemini 2.5 pro (I assume it enables search by default, because it added a bunch of "sources") and got the exact same result as the author. It claims ".bdi" is the ccTLD for Burundi, which is false they have .bi[1]. It claims ".time" and ".article" are TLDs.

I think the authors point stands.

EDIT: I tried it with "Deep Research" too. Here it doesn't invent either TLDs or HTML Element, but the resulting list is incomplete.

[1]: https://en.wikipedia.org/wiki/.bi

guyomes · 2025-09-21T16:08:01 1758470881

I wonder if it works better if we ask the LLM to produce a script that extract the resulting list, and then we run the script on the two input lists.

There is also the question of the two input lists: it's not clear if it is better to ask the LLM to extract the two input lists directly, or again to ask the LLM to write a script that extract the two input lists from the raw text data.

1718627440 · 2025-09-21T16:48:28 1758473308

> It claims ".time" and ".article" are TLDs.

Maybe they will be in a time frame when the LLM model is still in use.

softwaredoug · 2025-09-21T12:47:29 1758458849

In my experience reasoning and search come with their own set of tradeoffs. It works great when it works. But the variance can be wider than just an LLM.

Search and reasoning use up more context, leading to context rot, and subtler harder to detect hallucinations. Reasoning doesn’t always focus on evaluating the quality of evidence, just “problem solving” from some root set of axioms found in search.

I’ve had this happen in Claude code for example where it hallucinated a few details about a library based on what badly written forum post.

dgfitz · 2025-09-21T13:31:02 1758461462

> … all of which appears to be accurate.

Isn’t that the whole goddamn rub? You don’t _know_ if they’re accurate.

edent · 2025-09-21T12:58:32 1758459512

OP here. I literally opened up Gemini and used the defaults. If the defaults are shit, maybe don't offer them as the default?

Or, if LLMs are so smart, why doesn't it say "Hmmm, would you like to use a different model for this?"

Either way, disappointing.

magicalhippo · 2025-09-21T13:13:19 1758460399

> Or, if LLMs are so smart, why doesn't it say "Hmmm, would you like to use a different model for this?"

That is indeed an area where LLMs don't shine.

That is, not only are they trained to always respond with an answer, they have no ability to accurately tell how confident they are in that answer. So you can't just filter out low confidence answers.

mathewsanders · 2025-09-21T13:25:06 1758461106

Something I think would be interesting for model APIs and consumer apps to exposed would be the probability of each individual token generated.

I’m presuming that one class of junk/low quality output is when the model doesn’t have high probability next tokens and works with whatever poor options it has.

Maybe low probability tokens that cross some threshold could have a visual treatment to give feedback the same way word processors give feedback in a spelling or grammatical error.

But maybe I’m making a mistake thinking that token probability is related to the accuracy of output?

StilesCrisis · 2025-09-21T13:49:45 1758462585

Lots of research has been done here. e.g. https://aclanthology.org/2024.findings-acl.558.pdf

never_inline · 2025-09-22T14:45:00 1758552300

> Something I think would be interesting for model APIs and consumer apps to exposed would be the probability of each individual token generated.

Isn't that what logprobs is?

hobofan · 2025-09-21T13:22:07 1758460927

Then criticize the providers on their defaults instead of claiming that they can't solve the problem?

> Or, if LLMs are so smart, why doesn't it say "Hmmm, would you like to use a different model for this?"

That's literally what ChatGPT did for me[0], which is consistent from what they shared at the last keynote (quick-low reasoning answer per default first, with reasoning/search only if explicitly prompted or as a follow-up). It did miss one match tough, as it somehow didn't parse the `<search>` element from the MDN docs.

[0]: https://chatgpt.com/share/68cffb5c-fd14-8005-b175-ab77d1bf58...

pwnOrbitals · 2025-09-21T13:20:29 1758460829

You are pointing out a maturity issue, not a capability problem. It's clear to everyone that LLM products are immature, but saying they are incapable is misleading

delusional · 2025-09-21T13:22:14 1758460934

In you mind, is there anything an LLM is _incapable_ of doing?

maddmann · 2025-09-21T13:22:58 1758460978

“Defaults are shit” — is that really true though?! Just because it shits the bed on some tasks does not mean it is shit. For people integrating llms into any workflow that requires a modicum of precision or determinism, one must always evaluate output closely/have benchmarks. You must treat the llm as an incompetent but overconfident intern, and thus have fast mechanisms for measuring output and giving feedback.

jw1224 · 2025-07-25T19:32:31 1753471951

The “full Tailwind experience” is already freely available. What “lost opportunities for deep integration” is a frontend CSS framework missing?

Tailwind Plus (the commercial product) is like buying an off-the-shelf template. It’s just a collection of themes and pre-built components — useful for devs who want to get started quickly on a project, but it’s cookie-cutter and can easily be replicated by anyone with Tailwind itself.

vinnymac · 2025-07-25T19:53:10 1753473190

There are devs who think the currently available HTML elements are all we needed. But there are many more that believe we are missing primitives that Tailwind (and others) is attempting to solve for.

> It’s just a collection of themes and pre-built components

All reusable web components could be described as an optionally themed pre-built component. That's kind of the point.

kyriakos · 2025-07-26T07:51:12 1753516272

I no longer see value in prebuilt templates since LLMs can put things together sufficiently well for prototyping. Even when using templates before you still needed to customise them. Feels like we are going through a transition period.

jw1224 · 2025-07-08T13:09:22 1751980162

PHP doesn’t force keys... You can omit the key and simply write `foreach($items as $value)`

bvrmn · 2025-07-08T19:50:48 1752004248

BTW: https://www.php.net/manual/en/iterator.key.php

It's literally in the interface.

jw1224 · 2025-07-09T11:40:25 1752061225

I’m not sure I follow, what exactly is your complaint? The Iterator interface is described as:

> Interface for external iterators or objects that can be iterated themselves internally

Note “external iterators or objects”. The Iterator interface is not exactly everyday PHP, it’s a specialist utility for making classes iterable so they can be accessed like arrays. Most developers will rarely use it directly, and it’s not being used in the parent comment’s example either.

Iterating over something requires knowing where you are in the sequence, so of course you would need to implement a method to get the current position of the iteration.

bvrmn · 2025-07-12T06:13:19 1752300799

> Iterating over something requires knowing where you are in the sequence, so of course you would need to implement a method to get the current position of the iteration.

No you don't. Other languages don't require it. There is no issue to get a position outside of iterator and it's more generic approach.

jw1224 · 2025-07-18T20:46:02 1752871562

I don’t think you understand the purpose of PHP’s Iterator interface.

bvrmn · 2025-07-19T22:06:50 1752962810

Let me guess. To somehow patch iteration on associative arrays? And instead of bringing pairs or tuples as first class citizens it extends iterators with `key` and `value`. And now any Iterator implementation should track own sequentially increasing key. Very nice design indeed.

jw1224 · 2025-07-21T22:41:03 1753137663

Associative arrays are iterable internally without needing an interface or key tracking. You can just foreach them.

> Let me guess.

This just proves my point, you call this a “hot mess” whilst completely misunderstanding what it’s even used for.

The Iterator interface isn’t even used in the comment you first replied to.

bvrmn · 2025-07-08T19:49:11 1752004151

What's idiomatic way to get index (0, n-1) with the value? Parent example shows you could not use $key as a generic solution.

doekenorg · 2025-07-09T12:56:27 1752065787

You can use the SPL LimitIterator, feed it the generator, and give it the offset and limit.

``` foreach (new LimitIterator($generator, 0, 3) as $value) { echo $value; } ```

https://www.php.net/manual/en/class.limititerator.php

jw1224 · 2025-06-12T12:12:38 1749730358

You can buy phenol-free thermal paper, it’s about 20% more expensive where I live but much safer for you, and the quality is just as good.

jw1224 · 2025-06-12T12:11:42 1749730302

Great first article, and very interesting to see someone else using a receipt printer for bite-sized task management!

I have a variety of automations running which print actionable tasks to my receipt printer via a Raspberry Pi. It’s nice having a real-life ticket I can take hold of.

One thing to be aware of if you’re handling receipts frequently: make sure to buy phenol-free thermal paper. Phenol is toxic and some types of it are banned in certain countries.

laurieherault · 2025-06-12T12:16:19 1749730579

Yes, I think having a tangible task is really important!

Since I’m in Europe, we don’t really have paper with bisphenol anymore, but that’s not the case everywhere.

cocothem · 2025-06-12T12:47:00 1749732420

What about the ink? What's the keyword to search for nom toxic printer ink/cartridge

rozab · 2025-06-12T12:52:39 1749732759

Receipt printers don't use ink, instead they use thermal paper which darkens when heated. You can test this by scratching it with your nail, the heat is enough to leave a mark

gaudystead · 2025-06-12T20:05:59 1749758759

I agree with you on the first part, but are you sure that the heat from the fingernail is what's leaving that mark? I can take a cold object and run it on the receipt paper to get the same effect, so I think that's a different mechanism at play but I'm open to being proven wrong.

z2 · 2025-06-12T20:12:30 1749759150

The developers in the paper only require a small flash of local heat to turn black, which is why thermal printers can print so fast given the time it takes to heat up and cool down the print head. Friction produces enough heat to do that. You can test this by pressing an object down only, or running it very slowly across the surface in comparison.

joseda-hg · 2025-06-12T13:13:28 1749734008

I thought most receipt printers were thermal, no ink, just heat

hgomersall · 2025-06-12T12:21:29 1749730889

AFAICT, BPS is still widely used in Europe.

fauria · 2025-06-12T12:35:42 1749731742

Is there any way of knowing, just by examining it, whether a given thermal paper is toxic or not?

account42 · 2025-06-12T12:48:50 1749732530

Yes, you look at it carefully and if it looks like thermal paper it may be toxic.

If the substances used are known to be toxic is another matter but you won't know that even with a correct label because it takes time for us to find out that new substances are toxic.

z2 · 2025-06-12T13:37:20 1749735440

I think this is the right approach, speaking as someone who went down the rabbit-hole of looking at alternative non-bisphenol or non-phenol image developers. The very little research on the new ones tend to conclude "we don't know if it's toxic in the long term" or in the case of urea-based papers, "it's highly toxic against aquatic life."

To the GP, if the goal is to avoid phenol papers, phenol papers tend to develop deeper black. And in the US, phenol-free papers are new enough the backside often advertises it. Some are very misleadingly labeled BPA-free, which usually means it's made with the very similar and likely equally toxic BPS.

fauria · 2025-06-12T20:21:53 1749759713

Thank you for your insightful reply, I greatly appreciate it. However, it does not answer my question, unfortunately.

jw1224 · 2025-06-09T18:42:46 1749494566

If that’s the one reason, have you considered just… not using the AI features?

doublerabbit · 2025-06-09T18:45:47 1749494747

Sure you can for now. But what when it's forced upon you to use them?

jw1224 · 2025-06-09T19:59:14 1749499154

Well if that hypothetical situation ever happens, you can just switch to Linux then.

sph · 2025-06-09T20:12:58 1749499978

Why do you care if they switch now?

jug · 2025-06-09T21:25:48 1749504348

There is no real need and the issue is hypothetical?

vouaobrasil · 2025-06-09T18:45:58 1749494758

I find it offensive to have any generative AI code on my computer.

dkdcio · 2025-06-09T18:57:17 1749495437

I promise you there is Linux code that has been tab-completed with Copilot or similar, perhaps even before ChatGPT ever launched

vouaobrasil · 2025-06-09T19:02:08 1749495728

That is true. I actually was ambiguous in my post, because I meant code that generates stuff, not that was generated by AI, even though I don't like the latter, either.

reaperducer · 2025-06-09T20:22:54 1749500574

I find it offensive to have any generative AI code on my computer.

Settings → Apple Intelligence and Siri → toggle Apple Intelligence off.

It's not enabled by default. But in case you accidentally turned it on, turning it off gets you a bunch of disk space back as the AI stuff is removed from the OS.

Some people are just looking for a reason to be offended.

zapzupnz · 2025-06-10T00:34:34 1749515674

The theatrics of being *forced* to use completely optional, opt-in features has been a staple of discussions regarding Apple for years.

Every year, macOS and iPadOS look superficially more and more similar, but they remain distinct in their interfaces, features, etc. But the past 15 years have been "we'll be *forced* to only use Apple-vetted software, just like the App Store!"

And yeah, the Gatekeeper mechanism got less straight-forward to get around in macOS 15, but … I don't know, someone will shoot me down for this, but it's been a long 15 years to be an Apple user with all that noise going on around you from people who really don't have the first clue what they're talking about — and on HN, no less.

They can come back to me when what they say actually happens. Until then, fifteen dang years.

vouaobrasil · 2025-06-10T01:22:57 1749518577

Not forced to use, forced to download and waste 2GB of disk space.

reaperducer · 2025-06-10T02:16:05 1749521765

I presume you're talking about Apple Intelligence.

It's not forced. It's completely optional. It has to be downloaded.

And if you activate it, then change your mind, you get the disk space back when you turn it off.

vouaobrasil · 2025-06-10T00:37:05 1749515825

I have a limited connection, and don't want to update my computer with AI garbage.

reaperducer · 2025-06-10T02:18:05 1749521885

So don't. You have to tell the computer to download Apple Intelligence. It doesn't just happen on its own.

Just don't push the Yes button when it offers.

vouaobrasil · 2025-06-10T05:04:03 1749531843

Well, I thought it came with the OS update, so I guess I was mistaken then.

socalgal2 · 2025-06-09T18:57:05 1749495425

I think I know what you meant. You mean you don't want code that runs generative AI in your computer? But, what you wrote could also mean you don't want any code running that was generated by AI. Even with open source, your computer will be running code generated by AI as most open source projects are using it. I suspect it will be nearly impossible to avoid. Most open source projects will accept AI generated code as long as it's been reviewed.

vouaobrasil · 2025-06-09T19:01:22 1749495682

Good point, and you were right. I was ambiguous. I meant a system that generates stuff, not stuff that was generated by AI. But I'd rather not use stuff that was generated by AI, either. But you are also right. That will become impossible, and probably already is. Not a very nice world, I think. Best thing to do then is to minimize it, and avoid computers as much as possible....

azinman2 · 2025-06-09T18:55:02 1749495302

So, then don’t do that? It’s not like it’s automatically generating code without you asking.

vouaobrasil · 2025-06-09T18:57:50 1749495470

I didn't say "generating code", I meant I find it offensive to have any code sitting on my computer that generates code, whether I use it or not. I prefer minimalism: just have on my computer what I will use, and I have a limited data connection which means even more updates with useless code I won't use.

jw1224 · 2025-06-03T01:26:45 1748914005

I thoroughly enjoyed reading this. I peaked at 137wpm[0] and I’m not ashamed to say I love typing long sentences. I type fast enough that my thoughts normally can’t keep up with my fingers, but when I plan a sentence out in my head, the satisfaction I get from watching my inner monologue transmogrify into words onscreen is palpable. It’s a rush. And I totally get it. Great article!

[0] https://data.typeracer.com/pit/profile?user=mavis_b

jw1224 · 2025-05-26T18:00:59 1748282459

> If we […] tried to use these systems to solve the kinds of problems we need Einsteins, Hawkings and Taos for, then we would be in for one miserable disappointment after another

We can literally watch Terence Tao himself vibe coding formal proofs using Claude and o4. He doesn’t seem too disappointed.

https://youtu.be/zZr54G7ec7A?si=GpRZK5W1LDvWyBBw

wizzwizz4 · 2025-05-26T18:04:05 1748282645

He's the only person I know of who can actually get good results out of these systems (though I know several people who claim they can). What he's doing is fundamentally not the same thing as what most "vibe coders" are doing: take the autocomplete away, and he's still a talented mathematician.

gjm11 · 2025-05-26T20:21:15 1748290875

Sure, but what he's doing is very much not using Claude or o4 to do things we need Terence Tao for.

I'm not saying today's AI systems aren't useful for anything. I'm not saying they aren't impressive. I'm just saying they're nowhere close to the "Einstein, Hawking and Tao in your house" hyperbole in the OP. I would be very, very surprised if Terence Tao disagreed with me about that.

mufthun · 2025-05-26T19:01:04 1748286064

You can literally watch Terrence Tao stream himself formalizing existing proofs that he already formalized before.

jw1224 · 2025-05-08T20:14:51 1746735291

This is some excellent trivia. Thanks!

jw1224 · 2025-04-26T01:07:57 1745629677

Do you not think the AI output looks far more polished and print-ready? Canny edges have a lot of noise and don't look at all clean for coloring book purposes.