https://simonwillison.net/2025/Oct/8/claude-datasette-plugins/

sussmannbaka · 2025-10-14T13:07:26 1760447246

this is definitely closer to what I had in mind but it's still rather useless because it just shows what winning the lottery is like. what I am really looking for is neither the "Claude oneshot this" nor the "I gave up and wrote everything by hand" case but a realistic, "dirty" day-to-day work example. I wouldn't even mind if it was a long video (though some commentary would be nice in that case).

Zababa · 2025-10-14T13:42:17 1760449337

I don't think you should consider this as "winning the lottery", the author has been using these tools for a while.

The sibling comment with the writeup by the creator of Ghostty shows stuff in more detail and has a few cases of the agent breaking, though it also involves more "coding by hand".

nutjob2 · 2025-10-14T20:56:56 1760475416

I think the point is that you want to see typical results or process. How does it run when you use it 10 times, or 100 times, what results can you expect generally?

There's a lot of wishful thinking going around in this space and something more informative than cherrypicking is desperately needed.

Not least because lots of capable/smart people have no idea which way to jump when it comes to this stuff. They've trained themselves not to blindly hack solutions through trial and error but this essentially requires that approach to work.

Zababa · 2025-10-15T10:02:41 1760522561

Yeah that's a good point and the sibling comment seems to be pointing in the same direction. You could take a look at Steve Yegge's beads (https://steve-yegge.medium.com/introducing-beads-a-coding-ag..., https://github.com/steveyegge/beads) but the writeup is not super detailed.

I think your last point is pretty important, that all that we see is done by experienced people, and that today we don't have a good way to teaching "how to effectively use AI agents" other than saying to people "use them a lot, apply software engineering best practices like testing". That is a big issue, compounded because that stuff is new, there are lots of different tools, and they evolve all the time. I don't have a better answer here than "many programmers that I respect have tried using those tools and are sticking with it rather than going back" (with exceptions, like Karpathy's nanochat), and "the best way to learn today is to use them, a lot".

As for "what are they really capable of", I can't give a clear answer. They do make easy stuff easier, especially outside of your comfort zone, and seem to make hard stuff come up more often and earlier (I think because you do stuff outside your comfort zone/core experience zone ; or because you know have to think more carefully about design over a shorter period of time than before with less direct experience with the code, kind of like in Steve Yegge's case ; or because when hard stuff comes up it's stuff they are less good at handling so that means you can't use them).

The lower bound seems to be "small CLI tool", the higher bound seems to be "language learning app with paid users (sottaku I think? the dev talks on twitter. Lots of domain knowledge in japanese here to check the app itself) ; implementing a model on pytorch by someone that didn't know how to code before (00000005 seconds or something like this on twitter, has used all these models and tools a lot); reporting security issues that were missed in cURL", middle bound "very experienced dev shipping a feature faster and while doing other things on a semi mature codebase (Ghostty)", middle bound too is "useful code reviews". That's about the best I can give you I think.

sussmannbaka · 2025-10-15T08:16:10 1760516170

I'm not sure if you just didn't understand what I'm looking for. If I'm searching for a good rails screencast to get a feeling for how it's used, a blogpost consisting of "rails new" is useless to me. I know that these tools can oneshot tasks, but this doesn't help me when they can't.