Hacker Newsnew | past | comments | ask | show | jobs | submit | jw1224's commentslogin

> “To stave off some obvious comments:

> yoUr'E PRoMPTiNg IT WRoNg!

> Am I though?”

Yes. You’re complaining that Gemini “shits the bed”, despite using 2.5 Flash (not Pro), without search or reasoning.

It’s a fact that some models are smarter than others. This is a task that requires reasoning so the article is hard to take seriously when the author uses a model optimised for speed (not intelligence), and doesn’t even turn reasoning on (nor suggest they’re even aware of it being a feature).

I asked the exact prompt to ChatGPT 5 Thinking and got an excellent answer with cited sources, all of which appears to be accurate.


I just ran the same test on Gemini 2.5 pro (I assume it enables search by default, because it added a bunch of "sources") and got the exact same result as the author. It claims ".bdi" is the ccTLD for Burundi, which is false they have .bi[1]. It claims ".time" and ".article" are TLDs.

I think the authors point stands.

EDIT: I tried it with "Deep Research" too. Here it doesn't invent either TLDs or HTML Element, but the resulting list is incomplete.

[1]: https://en.wikipedia.org/wiki/.bi


I wonder if it works better if we ask the LLM to produce a script that extract the resulting list, and then we run the script on the two input lists.

There is also the question of the two input lists: it's not clear if it is better to ask the LLM to extract the two input lists directly, or again to ask the LLM to write a script that extract the two input lists from the raw text data.


> It claims ".time" and ".article" are TLDs.

Maybe they will be in a time frame when the LLM model is still in use.


In my experience reasoning and search come with their own set of tradeoffs. It works great when it works. But the variance can be wider than just an LLM.

Search and reasoning use up more context, leading to context rot, and subtler harder to detect hallucinations. Reasoning doesn’t always focus on evaluating the quality of evidence, just “problem solving” from some root set of axioms found in search.

I’ve had this happen in Claude code for example where it hallucinated a few details about a library based on what badly written forum post.


> … all of which appears to be accurate.

Isn’t that the whole goddamn rub? You don’t _know_ if they’re accurate.


OP here. I literally opened up Gemini and used the defaults. If the defaults are shit, maybe don't offer them as the default?

Or, if LLMs are so smart, why doesn't it say "Hmmm, would you like to use a different model for this?"

Either way, disappointing.


> Or, if LLMs are so smart, why doesn't it say "Hmmm, would you like to use a different model for this?"

That is indeed an area where LLMs don't shine.

That is, not only are they trained to always respond with an answer, they have no ability to accurately tell how confident they are in that answer. So you can't just filter out low confidence answers.


Something I think would be interesting for model APIs and consumer apps to exposed would be the probability of each individual token generated.

I’m presuming that one class of junk/low quality output is when the model doesn’t have high probability next tokens and works with whatever poor options it has.

Maybe low probability tokens that cross some threshold could have a visual treatment to give feedback the same way word processors give feedback in a spelling or grammatical error.

But maybe I’m making a mistake thinking that token probability is related to the accuracy of output?


Lots of research has been done here. e.g. https://aclanthology.org/2024.findings-acl.558.pdf


> Something I think would be interesting for model APIs and consumer apps to exposed would be the probability of each individual token generated.

Isn't that what logprobs is?


Then criticize the providers on their defaults instead of claiming that they can't solve the problem?

> Or, if LLMs are so smart, why doesn't it say "Hmmm, would you like to use a different model for this?"

That's literally what ChatGPT did for me[0], which is consistent from what they shared at the last keynote (quick-low reasoning answer per default first, with reasoning/search only if explicitly prompted or as a follow-up). It did miss one match tough, as it somehow didn't parse the `<search>` element from the MDN docs.

[0]: https://chatgpt.com/share/68cffb5c-fd14-8005-b175-ab77d1bf58...


You are pointing out a maturity issue, not a capability problem. It's clear to everyone that LLM products are immature, but saying they are incapable is misleading


In you mind, is there anything an LLM is _incapable_ of doing?


“Defaults are shit” — is that really true though?! Just because it shits the bed on some tasks does not mean it is shit. For people integrating llms into any workflow that requires a modicum of precision or determinism, one must always evaluate output closely/have benchmarks. You must treat the llm as an incompetent but overconfident intern, and thus have fast mechanisms for measuring output and giving feedback.


The “full Tailwind experience” is already freely available. What “lost opportunities for deep integration” is a frontend CSS framework missing?

Tailwind Plus (the commercial product) is like buying an off-the-shelf template. It’s just a collection of themes and pre-built components — useful for devs who want to get started quickly on a project, but it’s cookie-cutter and can easily be replicated by anyone with Tailwind itself.


There are devs who think the currently available HTML elements are all we needed. But there are many more that believe we are missing primitives that Tailwind (and others) is attempting to solve for.

> It’s just a collection of themes and pre-built components

All reusable web components could be described as an optionally themed pre-built component. That's kind of the point.


I no longer see value in prebuilt templates since LLMs can put things together sufficiently well for prototyping. Even when using templates before you still needed to customise them. Feels like we are going through a transition period.


PHP doesn’t force keys... You can omit the key and simply write `foreach($items as $value)`


BTW: https://www.php.net/manual/en/iterator.key.php

It's literally in the interface.


I’m not sure I follow, what exactly is your complaint? The Iterator interface is described as:

> Interface for external iterators or objects that can be iterated themselves internally

Note “external iterators or objects”. The Iterator interface is not exactly everyday PHP, it’s a specialist utility for making classes iterable so they can be accessed like arrays. Most developers will rarely use it directly, and it’s not being used in the parent comment’s example either.

Iterating over something requires knowing where you are in the sequence, so of course you would need to implement a method to get the current position of the iteration.


> Iterating over something requires knowing where you are in the sequence, so of course you would need to implement a method to get the current position of the iteration.

No you don't. Other languages don't require it. There is no issue to get a position outside of iterator and it's more generic approach.


I don’t think you understand the purpose of PHP’s Iterator interface.


Let me guess. To somehow patch iteration on associative arrays? And instead of bringing pairs or tuples as first class citizens it extends iterators with `key` and `value`. And now any Iterator implementation should track own sequentially increasing key. Very nice design indeed.


Associative arrays are iterable internally without needing an interface or key tracking. You can just foreach them.

> Let me guess.

This just proves my point, you call this a “hot mess” whilst completely misunderstanding what it’s even used for.

The Iterator interface isn’t even used in the comment you first replied to.


What's idiomatic way to get index (0, n-1) with the value? Parent example shows you could not use $key as a generic solution.


You can use the SPL LimitIterator, feed it the generator, and give it the offset and limit.

``` foreach (new LimitIterator($generator, 0, 3) as $value) { echo $value; } ```

https://www.php.net/manual/en/class.limititerator.php


You can buy phenol-free thermal paper, it’s about 20% more expensive where I live but much safer for you, and the quality is just as good.


Great first article, and very interesting to see someone else using a receipt printer for bite-sized task management!

I have a variety of automations running which print actionable tasks to my receipt printer via a Raspberry Pi. It’s nice having a real-life ticket I can take hold of.

One thing to be aware of if you’re handling receipts frequently: make sure to buy phenol-free thermal paper. Phenol is toxic and some types of it are banned in certain countries.


Yes, I think having a tangible task is really important!

Since I’m in Europe, we don’t really have paper with bisphenol anymore, but that’s not the case everywhere.


What about the ink? What's the keyword to search for nom toxic printer ink/cartridge


Receipt printers don't use ink, instead they use thermal paper which darkens when heated. You can test this by scratching it with your nail, the heat is enough to leave a mark


I agree with you on the first part, but are you sure that the heat from the fingernail is what's leaving that mark? I can take a cold object and run it on the receipt paper to get the same effect, so I think that's a different mechanism at play but I'm open to being proven wrong.


The developers in the paper only require a small flash of local heat to turn black, which is why thermal printers can print so fast given the time it takes to heat up and cool down the print head. Friction produces enough heat to do that. You can test this by pressing an object down only, or running it very slowly across the surface in comparison.


I thought most receipt printers were thermal, no ink, just heat


AFAICT, BPS is still widely used in Europe.


Is there any way of knowing, just by examining it, whether a given thermal paper is toxic or not?


Yes, you look at it carefully and if it looks like thermal paper it may be toxic.

If the substances used are known to be toxic is another matter but you won't know that even with a correct label because it takes time for us to find out that new substances are toxic.


I think this is the right approach, speaking as someone who went down the rabbit-hole of looking at alternative non-bisphenol or non-phenol image developers. The very little research on the new ones tend to conclude "we don't know if it's toxic in the long term" or in the case of urea-based papers, "it's highly toxic against aquatic life."

To the GP, if the goal is to avoid phenol papers, phenol papers tend to develop deeper black. And in the US, phenol-free papers are new enough the backside often advertises it. Some are very misleadingly labeled BPA-free, which usually means it's made with the very similar and likely equally toxic BPS.


Thank you for your insightful reply, I greatly appreciate it. However, it does not answer my question, unfortunately.


If that’s the one reason, have you considered just… not using the AI features?


Sure you can for now. But what when it's forced upon you to use them?


Well if that hypothetical situation ever happens, you can just switch to Linux then.


Why do you care if they switch now?


There is no real need and the issue is hypothetical?


I find it offensive to have any generative AI code on my computer.


I promise you there is Linux code that has been tab-completed with Copilot or similar, perhaps even before ChatGPT ever launched


That is true. I actually was ambiguous in my post, because I meant code that generates stuff, not that was generated by AI, even though I don't like the latter, either.


I find it offensive to have any generative AI code on my computer.

Settings → Apple Intelligence and Siri → toggle Apple Intelligence off.

It's not enabled by default. But in case you accidentally turned it on, turning it off gets you a bunch of disk space back as the AI stuff is removed from the OS.

Some people are just looking for a reason to be offended.


The theatrics of being *forced* to use completely optional, opt-in features has been a staple of discussions regarding Apple for years.

Every year, macOS and iPadOS look superficially more and more similar, but they remain distinct in their interfaces, features, etc. But the past 15 years have been "we'll be *forced* to only use Apple-vetted software, just like the App Store!"

And yeah, the Gatekeeper mechanism got less straight-forward to get around in macOS 15, but … I don't know, someone will shoot me down for this, but it's been a long 15 years to be an Apple user with all that noise going on around you from people who really don't have the first clue what they're talking about — and on HN, no less.

They can come back to me when what they say actually happens. Until then, fifteen dang years.


Not forced to use, forced to download and waste 2GB of disk space.


I presume you're talking about Apple Intelligence.

It's not forced. It's completely optional. It has to be downloaded.

And if you activate it, then change your mind, you get the disk space back when you turn it off.


I have a limited connection, and don't want to update my computer with AI garbage.


So don't. You have to tell the computer to download Apple Intelligence. It doesn't just happen on its own.

Just don't push the Yes button when it offers.


Well, I thought it came with the OS update, so I guess I was mistaken then.


I think I know what you meant. You mean you don't want code that runs generative AI in your computer? But, what you wrote could also mean you don't want any code running that was generated by AI. Even with open source, your computer will be running code generated by AI as most open source projects are using it. I suspect it will be nearly impossible to avoid. Most open source projects will accept AI generated code as long as it's been reviewed.


Good point, and you were right. I was ambiguous. I meant a system that generates stuff, not stuff that was generated by AI. But I'd rather not use stuff that was generated by AI, either. But you are also right. That will become impossible, and probably already is. Not a very nice world, I think. Best thing to do then is to minimize it, and avoid computers as much as possible....


So, then don’t do that? It’s not like it’s automatically generating code without you asking.


I didn't say "generating code", I meant I find it offensive to have any code sitting on my computer that generates code, whether I use it or not. I prefer minimalism: just have on my computer what I will use, and I have a limited data connection which means even more updates with useless code I won't use.


I thoroughly enjoyed reading this. I peaked at 137wpm[0] and I’m not ashamed to say I love typing long sentences. I type fast enough that my thoughts normally can’t keep up with my fingers, but when I plan a sentence out in my head, the satisfaction I get from watching my inner monologue transmogrify into words onscreen is palpable. It’s a rush. And I totally get it. Great article!

[0] https://data.typeracer.com/pit/profile?user=mavis_b


> If we […] tried to use these systems to solve the kinds of problems we need Einsteins, Hawkings and Taos for, then we would be in for one miserable disappointment after another

We can literally watch Terence Tao himself vibe coding formal proofs using Claude and o4. He doesn’t seem too disappointed.

https://youtu.be/zZr54G7ec7A?si=GpRZK5W1LDvWyBBw


He's the only person I know of who can actually get good results out of these systems (though I know several people who claim they can). What he's doing is fundamentally not the same thing as what most "vibe coders" are doing: take the autocomplete away, and he's still a talented mathematician.


Sure, but what he's doing is very much not using Claude or o4 to do things we need Terence Tao for.

I'm not saying today's AI systems aren't useful for anything. I'm not saying they aren't impressive. I'm just saying they're nowhere close to the "Einstein, Hawking and Tao in your house" hyperbole in the OP. I would be very, very surprised if Terence Tao disagreed with me about that.


You can literally watch Terrence Tao stream himself formalizing existing proofs that he already formalized before.


This is some excellent trivia. Thanks!


Do you not think the AI output looks far more polished and print-ready? Canny edges have a lot of noise and don't look at all clean for coloring book purposes.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: