Hacker Newsnew | past | comments | ask | show | jobs | submit | skerit's commentslogin

Do we also have stills of all the episodes? Or only audio?

There are production stills that are used like a slide show and combined with the recorded audio. Certain episodes have been reconstructed using animation such that the basic scene blocking and events are played out alongside the recorded audio.

This is kind of what LoopLM is doing, no? https://arxiv.org/abs/2510.25741

Thanks. This is cool

Gemini might be great at benchmarks, it is terrible at actual agentic coding. So Anthropic seems like a more logical choice.

The particulars don't matter. OpenAI will never do this.

I hate the new way scrolling works in the Youtube & Netflix apps. So janky.

> They were rare, and special, and you'd have a few photos per YEAR to look back on. The feel of photos back then, was at least 100x stronger than now. [...] But once they became freely available that same amount of emotion is now split across many thousands of photos

I don't think I fully agree. Sure people make so many photo's that they don't have the time or the will to start looking through them all.

You can't just whip out your phone and start scrolling through thousands of photo's with friends. It would get so boring so fast.

But if you put some effort into making a nice little selection of the best photo's, that emotion is 100% still there.


And there’s software to help you with that. For example, using faces, time stamps and GPS info iOS creates collections for you.

Yes, it’s crude, and you have to do the face tagging, but I think it’s a huge improvement over not having that.


So now the value is created through curation. Before it was inherent at creation. If you never curate it might seem like it lost value in comparison.


Curation was implicit when the cost of image creation was high and authors had to consider the photos they were taking beforehand. Now curation comes afterward.


In my childhood, slide shows were very deliberately curated, in no small part because the presentation of the slides was a relatively elaborate, shared family event.


But curation was done mainly by the creators, who were the people who were able to do the creation in the first place (professional photographers, people who could afford to buy the expensive camera, people who could afford the software for editing photos/slideshows in mass etc.). Now everyone can curate, and consumers can actually pick which curated collection is truly the best.


But what does 'best' even mean in this context? A photographer sharing their 'best' photos was some combination of sharing their personal perspective and their effort to capture shared memories on behalf of others. So yeah it was a limited/privileged (often patriarchal) role. What they picked was interpretive, but that curation was part of the expression/information the viewer was experiencing.

We can mix and match the media we choose to view or keep so easily, when previously there was so much more material and opportunity cost to choosing what to shoot, develop, keep, and share. I think that inevitably loses some meaning.


> Saying it's "fully conscious" is silly, and anyone with this background should know better

I'm surprised that anyone that truly knows how LLMs work would ever think they're sentient.

I made a little presentation for my colleagues last year to explain how LLMs really work (in an effort to stop them from asking it too many stupid questions) and it made so much more sense to them afterwards.


> Then a brick hits you in the face when it dawns on you that all of our tools are dumping crazy amounts of non-relevant context into stdout thereby polluting your context windows.

I've found that letting the agent write its own optimized script for dealing with some things can really help with this. Claude is now forbidden from using `gradlew` directly, and can only use a helper script we made. It clears, recompiles, publishes locally, tests, ... all with a few extra flags. And when a test fails, the stack trace is printed.

Before this, Claude had to do A TON of different calls, all messing up the context. And when tests failed, it started to read gradle's generated HTML/XML files, which damaged the context immensely, since they contain a bunch of inline javascript.

And I've also been implementing this "LLM=true"-like behaviour in most of my applications. When an LLM is using it, logging is less verbose, it's also deduplicated so it doesn't show the same line a hundred times, ...

> He sees something goes wrong, but now he cut off the stacktraces by using tail, so he tries again using a bigger tail. Not satisfied with what he sees HE TRIES AGAIN with a bigger tail, and … you see the problem. It’s like a dog chasing its own tail.

I've had the same issue. Claude was running the 5+ minute test suite MULTIPLE TIMES in succession, just with a different `| grep something` tacked at the end. Now, the scripts I made always logs the entire (simplified) output, and just prints the path to the temporary file. This works so much better.


> Claude is now forbidden from using `gradlew` directly, and can only use a helper script we made. It clears, recompiles, publishes locally, tests, ... all with a few extra flags. And when a test fails, the stack trace is printed.

I think my question at this point is what about this is specific to LLMs. Humans should not be forced to wade through reams of garbage output either.


Humans have the ability to ignore and generally not remember things after a short scan, prioritize what's actually important etc. But to an LLM a token is a token.

There's attempts at effectively doing something similar with analysis passes of the context - kinda what things like auto-compaction is doing - but I'm sure anyone who has used the current generation of those tools will tell you they're very much imperfect.


The “a token is a token” effect makes LLMs really bad at some things humans are great at, and really good at some things humans are terrible at.

For example, I quickly get bored looking through long logfiles for anomalies but an LLM can highlight those super quickly.


Isn’t the purpose of self attention exactly to recognize the relevance of some tokens over others?


That may help with tokens being "ignored" while still being in the context window, but not context window size costs and limitations in the first place.


> I think my question at this point is what about this is specific to LLMs. Humans should not be forced to wade through reams of garbage output either.

Beware I'm a complete AI layman. All this is from background reading of popular articles. It may well be wrong. It's definitely out of date.

It has to do with how the attention heads work. The attention heads (the idea originated from the "Attention is all you need" paper, arguably the single most important AI paper to date), direct the LLM to work on the most relevant parts of the conversation. If you want a human analogue, it's your attention heads that are tacking the interesting points in a conversation.

The original attention heads output a relevance score for every pair of words in the context window. Thus in "Time flies like an arrow", it's the attention heads that spot the word "Time" is very relevant to "arrow", but not "flies". The implication of this is an attention head does O(N*N) work. It does not scale well to large context windows.

Nonetheless, you see claims of "large" context windows the LLMs marketing. (Large is in quotes, because even a 1M context window begins to feel very cramped in a write / test / fix loop.) But a 1M context-window would require a attention head requiring a 1 trillion element matrix. That isn't feasible. The industry even has a name for the size of the window they give in their marketing: the Effective Context Window. Internally they have another metric that measures the real amount of compute they throw at attention: the Physical Context Window. The bridge between the two is some proprietary magic that discards tokens in the context window that are likely to be irrelevant. In my experience, that bridge is pretty good at doing that, where "pretty good" is up to human standards.

But eventually (actually quickly in my experience), you fill up even the marketed size of the context window because it is remembering every word said, in the order they were said. If it reads code it's written to debug it, it appears twice in the context window. All compiler and test output also ends up there. Once the context window fills up they take drastic action, because it like letting malloc fail. Even reporting a malloc failure is hard because it usually needs more malloc to do the reporting. Anthropic calls it compacting. It throws away 90% of your tokens. It turns your helpful LLM into a goldfish with dementia. It is nowhere near as good as human is at remembering what happened. Not even close.


In my experience, it's the old time-invested vs time-saved trade off. If you're not looking at these reams of output often enough, the incentive to figure out all the flags and configs for verbosity to write these script is lower: https://xkcd.com/1205/

And because these issues are often sporadic, doing all this would be an unwanted sidequest, so humans grit their teeth and wade through the garbage manually each time.

With LLMs, the cost is effectively 0 compared to a human, so it doesn't matter. Have them write the script. In fact, because it benefits the LLM by reducing context pollution, which increases their accuracy, such measures should be actively identified and put in place.


Lots of tools have a --quiet or --output json type option, which is usually helpful


The way I've solved this issue with a long running build script is to have a logging scripts which redirects all outputs into a file and can be included with ``` # Redirect all output to a log file (re-execs script with redirection) source "$(dirname "$0")/common/logging.sh" ``` at the start of a script.

Then when the script runs the output is put into a file, and the LLM can search that. Works like a charm.


This has been my exact experience with agents using gradle and it’s beyond frustrating to watch. I’ve been meaning to set up my own low-noise wrapper script.

This post just inspired me to tackle this once and for all today.


> This works so much better.

Pi coding agent does this by default with all outputs but Claude (all versions tested, including opus 4.6) just completely ignores this capability. Even when the tool output explicitly tells the agent that the full output is saved in a particular file, Claude reruns the command.


Wow, I'd love to do this. Any tips on how to build this (or how to help an LLM build this), specifically for ./gradlew?


How is it forbidden? I tell agents to use my wrappers in AGENTS but they ignore it half the time and use the naked tool.


If you get desperate, I've given my agent a custom $PATH that replaces the forbidden tools with shims that either call the correct tool, or at least tell it what to do differently.

~/agent-shims/mvn:

    #!/bin/bash
    echo "Usage of 'mvn' is forbidden. Use build.sh or run-tests.sh"
That way it is prevented from using the wrong tools, and can self-correct when it tries.


Permissions scoping


Then they attempt to download the missing tool or write a substitute from scratch. Am I the only one who runs into this??


> I used Claude Code and Codex for the translation. This was human-directed, not autonomous code generation. I decided what to port, in what order, and what the Rust code should look like. It was hundreds of small prompts, steering the agents where things needed to go. After the initial translation, I ran multiple passes of adversarial review, asking different models to analyze the code for mistakes and bad patterns. > The requirement from the start was byte-for-byte identical output from both pipelines. The result was about 25,000 lines of Rust, and the entire port took about two weeks. The same work would have taken me multiple months to do by hand. We’ve verified that every AST produced by the Rust parser is identical to the C++ one, and all bytecode generated by the Rust compiler is identical to the C++ compiler’s output. Zero regressions across the board

This is the way. Coding assistants are also really great at porting from one language to the other, especially if you have existing tests.


> Coding assistants are also really great at porting from one language to the other

I had a broken, one-off Perl script, a relic from the days when everyone thought Drupal was the future (long time ago). It was originally designed to migrate a site from an unmaintained internal CMS to Drupal. The CMS was ancient and it only ran in a VM for "look what we built a million years ago" purposes (I even had written permission from my ex-employer to keep that thing).

Just for a laugh, I fed this mess of undeclared dependencies and missing logic into Claude and told it to port the whole thing to Rust. It spent 80 minutes researching Drupal and coding, then "one-shotted" a functional import tool. Not only did it mirror the original design and module structure, but it also implemented several custom plugins based on hints it found in my old code comments.

It burned through a mountain of tokens, but 10/10 - would generate tens of thousands of lines of useless code again.

The Epilogue: That site has since been ported to WordPress, then ProcessWire, then rebuilt as a Node.js app. Word on the street is that some poor souls are currently trying to port it to Next.js.


> 10/10 - would generate tens of thousands of lines of useless code again.

Me too! A couple days ago I gave claude the JMAP spec and asked it to write a JMAP based webmail client in rust from scratch. And it did! It burned a mountain of tokens, and its got more than a few bugs. But now I've got my very own email client, powered by the stalwart email server. The rust code compiles into a 2mb wasm bundle that does everything client side. Its somehow insanely fast. Honestly, its the fastest email client I've ever used by far. Everything feels instant.

I don't need my own email client, but I have one now. So unnecessary, and yet strangely fun.

Its quite a testament to JMAP that you can feed the RFC into claude and get a janky client out. I wonder what semi-useless junk I should get it to make next? I bet it wouldn't do as good a job with IMAP, but maybe if I let it use an IMAP library someone's already made? Might be worth a try!


Same here. I had Claude write me a web based RSS feed reader in Rust. It has some minor glitches I still need to iron out, but it works great, is fast as can be, and is easy on the eyes.

https://github.com/AdrianVollmer/FluxFeed


Haha glad to see someone else did something like this. A couple weeks ago I asked Claude to recommend a service that would allow me to easily view a local .xml file as an RSS feed. It instead built a .html RSS viewer.


Re "is fast as can be": in my experience generating C/Zig code via Codex, agent generated code is usually several multiples slower than hand optimized code.


Yeah, I’m sure my Claude generated email client could be even faster if I wrote it by hand. Modern computers can retire billions of instructions per second per core. All operations that aren’t downloading or processing gigabytes of data should be instant on modern computers.

Claude’s toy email client gets closer to the speed limit than Gmail does. Why is Gmail so slow? I have no idea.


I find the sweet spot is to take LLM generated code, then profile manually and heavily supervise or hand implement specific improvements.


You can also let the agent use a profiling tool. It probably cannot beat you, but does find and improve a lot of things for sure.


Look, it's an RSS reader, not a numeric solver for PDEs. What I clearly meant was: Every interaction is instant, no noticable delay at all, except the reader view, which makes a network request to an external site.


Hey, sorry, I just have to defuse assumptions people make when they see Rust, LLMs, and "as fast as can be" in short proximity. Your project is obviously cool, and I don't think the fact that it's likely still multiples more resource intensive than an absolutely minimal version takes away from that.


you have to ask it to profile and optimize the code for you. Then have it document the changes and run it in a loop. It’ll do wonders.

I asked a cursor agent to do the same for a geotiff to pmtiles converter. It managed to optimize the performance from tens of days to half a day for the case I wanted to solve.


Given parent and GP are both using Claude... have you tried Claude? (I say this as someone who has not tried Claude recently. I did try Claude Code when it first came out, though.)


First, it is important for these discussions that people include details like I did. We're all better off to not generalize.

RE: Claude Code, no I haven't used it, but I did do the Anthropic interview problem, beating all of Anthropic's reported Claude scores even with custom harnesses etc.

It's not a dunk that agents can't produce "as fast as can be" code; their code is usually still reasonably fast; it's just often 2-10x slower than can be.


There is a lot to be done with good prompting.

Early on, these code agents wouldn't do basic good hygiene things, like check if the code compiled, avoid hallucinating weird modules, writing unit tests. And people would say they sucked ....

But if you just asked them to do those things: "After you write a file lint it and fix issues. After you finish this feature, write unit tests and fix all issues, etc ..."

Well, then they did that, it was great! Later the default prompts of these systems included enough verbiage to do that, you could get lazy again. Plus the models are are being optimized to know to do some of these things, and also avoid some bad code patterns from the start.

But the same applies to performance today. If you ask it to optimize for performance, to use a profiler, to analyze the algorithms and systemically try various optimization approaches ... it will do so, often to very good results.


Yep. Claude code is best thought of as an overachieving junior / mid. It can run off and do all sorts of work on its own, but it doesn't have great instincts and it can't read your mind about what you want.

Use it as if you're the tech lead managing a fresh hire. Tell it clearly what you want it to focus on and you get a much better result.


That's a given, and agents still come out 2-10x slower in my experience.


Rust is the final language.

Defect free. Immaculate types. Safe. Ergonomic. Beautiful to read.

AI is going to be writing a lot of Rust.

The final arguments of "rust is hard to write" are going to quiet down. This makes it even more accessible.


> Rust is the final language.

> Defect free.

I am an upstream developer on the Rust Project (lang, library, cargo, others), and obviously a big fan of Rust. This kind of advocacy doesn't help us, and in fact makes our jobs harder, because for some people this kind of advocacy is their main experience of people they assume are representative of Rust. Please take it down a notch.

I think Rust is the best available language for many kinds of problems. Not yet all, but we're always improving it to try to work for more people. It gets better over time. I'd certainly never call it, or almost any other software, "defect free".

And I'd never call it "the final language"; we're building it to last the test of time, and we hope things like the edition system mean that the successor to Rust is a future version of Rust, but things can always change, and we're not the only source of great ideas.

If you genuinely care about Rust, please adjust your advocacy of Rust to avoid hurting Rust and generating negative perceptions of Rust.


I’d also add: as a lover of forward progress, I really hope rust isn’t the last good idea programming language designers have. I love rust. But there are dozens of things I find a bit frustrating. Unfortunately I don’t think I’m clever & motivated enough to write a whole new language to try to improve it. But I really hope someone else is!

For a taste: I wish we didn’t need lifetime annotations, somehow. I wish rust had first class support for self borrows, possibly via explicit syntax indicating that a variable is borrowed, and thus pinned. Unpin breaks my brain, and I wish there were ways to do pin projections without getting a PhD first. I wish for async streams. I wish async executors were in std, and didn’t take so long to compile. I could go on and on.

I feel like there’s an even simpler & more beautiful language hiding inside rust. I can’t quite see it. But I really hope someone else can bring it into the world some day.


> For a taste: I wish we didn’t need lifetime annotations, somehow. I wish rust had first class support for self borrows, possibly via explicit syntax indicating that a variable is borrowed, and thus pinned. Unpin breaks my brain, and I wish there were ways to do pin projections without getting a PhD first. I wish for async streams. I wish async executors were in std, and didn’t take so long to compile. I could go on and on.

I would like all of that as well. I think we can do much of that in Rust. I would love to see self-borrows available, and not just via pinning; I would also like relative pointers. I would like people to almost never have to think about pin or unpin; one of my rules of thumb is that if you see Pin or Poll, you've delved too deep, and nobody should need those to write almost any async code, including the interiors of trait implementations and async runtime implementations. And I would absolutely like to see async iterators, async executors, and many kinds of async traits in the standard library.

I also think there are plenty of things we are unlikely to get to even in an edition, and that might never happen without a completely different language. I'm not sure if we'll find a path to doing those in Rust, or if they will be the domain of some future language that makes different trade-offs.


> I also think there are plenty of things we are unlikely to get to even in an edition, and that might never happen without a completely different language.

Yes, this is my feeling too. All programming languages - perhaps, all computer programs - must decide how malleable they should be. Fast moving systems are exciting, but they're very annoying to use, or build on top of. I think generally we want a very slow moving language for infrastructure software, like databases or the linux kernel. Slow moving languages are often good for users, because they don't need to learn new things or rewrite existing software. (I think thats one of the reasons python3 was so poorly received.)

It might be too late for large changes in rust. This a sign of maturity, but its also a little bit sad. I want all those features you mention too.

Some day. Maybe LLMs will help somehow. Rust is, thankfully, not the last programming language humans will invent.


I'm sorry my autistic elation for Rust is perceived as being over the top, but I truly do mean everything I say. I could have articulated it in a less saccharine tone.

> > Defect free.

There's a Google talk on the matter. "Defect rate" / "defect free" is a term that is used quite a bit. I've latched onto this, because I find it true. Rust is a far more defect free language on a line by line basis measured and compared to other statically typed languages.

> And I'd never call it "the final language"

I honestly disagree, and I'm sticking to this prediction.

I don't think we're going to be writing code much longer by ourselves. The machines are going to outpace us soon.

Maybe something that's LLM-oriented will take over, but at that point these won't be "human" languages anymore. So I'll revise my claim to "Rust is the last human language".

If I want to serialize my thoughts to code, Rust is the language for it. It's probably the last one I'll be hand-writing or sending my revisions back to the LLM for.

Rust will also be an order of magnitude easier to author, at which point there shouldn't be much holding people back from producing it. If you have a choice between generating Java, C++, Go, or Rust, you're going to pick Rust almost every time unless you have to fit into those other ecosystems.

If you haven't used Claude or Codex with Rust, it's sublime and you should drop what you're doing to try it.


As a member of t-compiler, seconded.


Thank you, thank you, thank you.


> Beautiful to read.

Oh my, there's a new language called Rust? Didn't they know there already is one? The old one is so popular that I can't imagine the nicely readable one to gain any traction whatsoever (even if the old one is an assault on the senses).


> Rust is the final language.

I honestly can't tell if this is a humorous attack or not.

Poe's law is validated once again.


It's honest. If we can serialize our ideas to any language for durability, Rust is the way to go.

It's not the best tool for the job for a lot of things, but if the LLMs make writing it as fast as anything else - whelp, I can't see any reason not to do it in Rust.

If you get any language outputs "for free", Rust is the way to go.

I've been using Claude to go ridiculously fast in Rust recently. In the pre-LLM years I wrote a lot of Rust, but it definitely was a slow to author language. Claude helps me produce it as fast as I can think. I spend most of my time reviewing the code and making small fixes and refactors. It's great.


While Rust is excellent, you must acknowledge that Rust has issues with compilation time. It also has a steep learning curve (especially around lifetimes.) It's much too early to say Rust is the "final" language, especially since AI is driving a huge shift in thinking right now.

I used to think that I would never write C code again, but when I decided recently to build something that would run on ESP32 chips, I realized there wasn't any good reason for me to use Rust yet. ESP-IDF is built on C and I can write C code just fine. C compiles quickly, it's a very simple language on the surface, and as long as you minimize the use of dynamic memory allocation and other pitfalls, it's reliable.


If you're programming for ESP, then embassy is the way to go in most cases. You don't need to learn much about lifetimes in most of the application code. Steep learning curve people refer it is "thing blow up at compile time vs runtime." It's easy to write JS or C that passes all tests and compiles and then wonderful blows up when you start using it. It just forces you to learn things you need to know at IMO right now.

My biggest problem with rust right now is enormous target/ dirs.


> My biggest problem with rust right now is enormous target/ dirs.

We're working on that and it should get better soonish. We're working on shared caches, as well as pruning of old cached builds of dependencies that are unlikely to be reused in a future build.


thanks beejesus! (aka the devs) I'm tired of forcing shit into workspaces just to slightly mitigate these issues


I feel compilation time is unbearable in Rust, even on MacBook pro M3 max, don't get me started on space


I'll just stick with C as my lingua franca, and won't be involving Microsoft in my programming life, thanks.


are you implying that using Rust involves using MS products?


[flagged]


Anthropic? ChatGPT is the one affiliated with Microsoft.


Not Microsoft.


You’re thinking of OpenAI and ChatGPT, which has a (now-rocky) partnership with Microsoft.

Claude is an Anthropic offering.


> It's honest.

It's not, nor is it well informed. People are currently developing languages specifically for use by LLMs.

> It's not the best tool for the job for a lot of things

Then how could it possibly be the final language?

> if the LLMs make writing it as fast as anything else - whelp, I can't see any reason not to do it in Rust

This has nothing to do with the claim that it's the final language.


What did I say that is false?


Sometimes I forget programming languages aren't a religion, and then I see someone post stuff like this. Programming languages really do inspire some of us to feel differently.


I would say it's overall the best existing language, probably due to learning from past mistakes. On the whole it wins via the pro/con sum. But ... Still loads of room for improvement! Far from a perfect lang; just edges out the existing alternatives by a bit.


I'd say that it's taking much needed steps to achieve perfection but many more steps are there ahead. The next language closer to perfection would definitely have a much gentler introduction curve, among other things.


Which coding assistant do you think needs a gentle introduction curve?


Rust lacks dependent typing, which advanced languages like Idris2 have.


Needs monads (not joking)


If AI gets sufficiently good what will be the point of rust? I can just whip out some C code, tell the AI to make it safe (or just ask it if the code contains any undefined behavior), done.


Why not go full functional programming at that point? If the main issue with FP has been accessibility, then it should really take off now.


When you do fully value-oriented programming in Rust (i.e. no interior mutability involved) that's essentially functional programming. There's mutable, ephemeral data involved, but it's always confined to a single well-defined context and never escapes from it. You can even have most of your code base be sans-IO, which is the exact same pattern you'd use in Haskell.


I actually like rust more than Haskell, but `You can even have most of your code base be sans-IO, which is the exact same pattern you'd use in Haskell.` glosses over the fact that in Haskell it's enforced at compile time.


Another argument as to why rust isn't the forever-language. My forever language should include effects!

Even rust has need of this. For example, I want a nopanic effect I can put on a function which makes it a compile error for anything that function calls to panic.


Though I think it's the closest language right now, ideally you have something that is close to "zero-overhead" as your forever language.

I really like how flix.dev looks, but there's always a little nagging at the back of my head that something like rust will always produce more performant software.


> Even rust has need of this. For example, I want a nopanic effect I can put on a function which makes it a compile error for anything that function calls to panic.

This!

This apart from build times is my biggest request for the language.

Nopanic, nomalloc, etc.


I wouldn’t because idiomatic Haskell is way slower than idiomatic Rust.


Isn’t Rust a pretty good functional language? It has most of the features that enable safe, correct code without being anal about immutability and laziness that make performance difficult to predict.


Working code talks.

Bullshit walks.


Rust may be the darling of the moment, but Erlang is oft slept on.

As AI makes human-readable syntax less relevant, the Erlang/Elixir BEAM virtual machine is an ideal compilation target because its "let it crash" isolated process model provides system-level fault tolerance against AI logic errors, arguably more valuable than Rust’s strict memory safety.

The native Actor Model simplifies massive concurrency by eliminating shared state and the complex thread management. BEAM's hot code swapping capability also enables a continuous deployment where an AI can dynamically rewrite and inject optimized functions directly into live applications with zero downtime.

Imagine a future where an LLM is constantly monitoring server performance, profiling execution times, and dynamically rewriting sub-optimal functions in real-time. With Rust, every optimization requires a recompile and a deployment cycle that interrupts the system.

Finally, Erlang's functional immutability makes deterministic AI reasoning easier, while its built-in clustering replaces complex external infrastructure, making it a resilient platform suited for automated iteration.


I can't comment on production viability today but if you assume that language itself is irrelevant then it becomes clear that runtime and engine level is the way to go.

We spend quite a lot of time conceptualizing around safe self mod and to build apps that can change at runtime. We ended up using custom Lua VM, type system to catch mistakes, declarative homogenous infrastructure and actor model (erlang inspired).

Actor model provides not just a good isolation but also it's much easier for AI to reason (since most of components are not that large), we already able to use it to write quite complex systems with ease.

Another upside - in actor model you don't really need any of this fluff with cron jobs, queues and etc, all the logic naturally maps to indended architecture, making implementation of agents _very_ easy.

https://wippy.ai/en/tutorials/micro-agi It takes 4-5 files to create mini sandboxed AI agent at top of actor model with ability to modify own toolkit while having system guardails and no access to core filesystem.


I guess at a high level im thinking about what kind of running systems are the easiest to edit as they execute. Maybe I should have even picked clojure for being homoiconioc and not needing to be parst into an ast. The LLM can traverse, prune, graft and transform s-expressions directly with perfect structural accuracy.


Please post this. I'd love to play with it and, especially, see how fast it is.


Seconding this comment, as someone who loves JMAP.


Code is here: https://github.com/josephg/claude-mail

The JMAP client itself is hosted here: https://seph.au/claude-webmail/

I can't prove this but its a purely static web app. You need a jmap server to use it. If you use stalwart, set:

    server.listener.http.permissive-cors = true
or

    server.listener.https.permissive-cors = true
Then you should be able to put https://localhost:8080/ into the URL box. It should also work with fastmail, but I haven't tested it.


Just curious, does it look anything like this library?

https://docs.rs/jmap-client/latest/jmap_client/


Also curious why would one be proud of having an LLM rewrite something that there is already a library for. I personally feel that proud LLM users boasting sounds as if they are on amphetamines.


It made a webmail client. Not a jmap library.


Not sure I understand, wouldn’t a webmail client in rust need client code like this or to use a library like this?


Yeah but it’s like saying, “why are you impressed with Claude making a car when there are plans for an engine online?”. Even if Claude used that code (it didn't), it made the whole car. Not just an engine. There’s a lot more stuff going on than simply calling a backend mail server over jmap.

And fyi, jmap is just a protocol for doing email over json & http. It’s not that hard to roll your own. Especially in a web browser.


Your initial claim talked about jmap and this looks to me like a full implementation of the RFC in rust. That is the hard part of an email client IMO so I’m not sure I’d agree with your analogy, but you’re saying it made a web app which called a library like this?

Would be interesting to see it, did you publish it yet?


> looks to me like a full implementation of the RFC in rust

Only the client parts. And only the client parts its actually using. JMAP clients can be much simpler than servers. A JMAP server needs the whole protocol. JMAP clients only need to implement the parts they use. Servers also need to parse email message envelopes - which is way more difficult to do correctly than people think. JMAP clients can just use pre-parsed messages from the server.

Anyway, the code is here if you wanna take a look:

https://github.com/josephg/claude-mail

Claude put its JMAP API wrapper code in a child crate (confusingly also called jmap-client).


Cool thanks.


Interesting! I am getting tired of looking at Roundcube and having weird issues and was thinking of doing the same. Were you planning on making the result public?


Did you use dioxus? I had claude write up a test web app with it, but when attempting to use a javascript component it built it couldn't get past memory access out of bound errors.


I used leptos. Before I started I made a text file containing the entire leptos manual, and told claude to read it. I don't know if that helped, but claude seems to use it just fine.


Can you release it as open source code?


Sure; I’ll throw it online in a few hours when I’m at my computer.



> It burned through a mountain of tokens, but 10/10 - would generate tens of thousands of lines of useless code again.

This is the biggest bottleneck at this point. I'm looking forward to RAM production increasing, and getting to a point where every high-end PC (workstation & gaming) has a dedicated NPU next to the GPU. You'll be able to do this kind of stuff as much as you want, using any local model you want. Run a ralph loop continuously for 72 hours? No problem.


Wasting electricity to "generate tens of thousands of lines of useless code" at will? Why is that in any way a desirable future?


One person's waste is another's value. Do you have any idea how "wasteful" tik tok or any other streaming platform is? I'll grant that AI is driving unprecedented data center development but it's far from the root cause, or even a leading clause, of our climate issues. I always find it strange that this is the first response so many have to AI, when it poses other more imminent existential threats IMO.


It was a reply to what the GP said about running local generation 24/7 for no good reason, just because it's possible (and electricity is too cheap, apparently). There are many more threats, but those are beside the point in this specific context.


The climate change alarms have been sounding for decades and yet vehicles keep getting bigger. Even in formerly "doing it right" countries like Japan. Turns out humans will always choose vanity and status symbols over facts. Oh well


A lot of code is "useless" only in the sense that no one wants to buy it and it will never find its way into an end user product. On the other hand, that same code might have enormous value for education, research, planning, exploration, simulation, testing, and so on. Being able to generate reams of "useless" code is a highly desirable future.


Obviously "useful" doesn't just involve making money. Code that will be used for education and all of these things is clearly not useless.

But let's be honest to ourselves, the sort of useless code the GP meant will never ever be used for any of that. The code will never leave their personal storage. In that sense it's about as valuable for the society at large as the combined exabytes of GenAI smut that people have been filling their drives with by running their 4090s 24/7.


Optimists will imagine it to one day be as taxing and thus as wasteful as firing up MS Paint.

No that’s a stretch, but firing up a AAA game.


At least you (hopefully) get hours of entertainment from firing up an AAA game. Whereas generating vast amounts of code that you're never going to use has… some novelty value, I suppose. Luckily the novelty is going to wear off soon, I can't really see many people getting their daily happiness boost from making code machine say brrrrt straight to /dev/null. Even generating smut is a vastly more understandable (and vastly more commonplace, even now) use case for running genAI every day for hours.


The bigger use for case for AAA games? Employment for highly talented artists, 3D modellers and sculptors, texture artists, sound and music artists, and even programmers.

At least it gives _something_ back. Until of course we've obsoleted all of them as well.

Most of the AAA games I've paid for sit there in my Steam library and never get played. At least _some_ of the money probably went to those talented people whose work was used for training GenAI and coding models (and yes I say this as someone who has used all of these tools to prototype my own games, and still think human created content is of a much higher quality, just more expensive to produce).


> I can't really see many people getting their daily happiness boost from making code machine say brrrrt straight to /dev/null

How long time do we have to wait before these people get bored? Or might they actually find what they generate useful and it doesn't all go straight to /dev/null, since seemingly it seems to gain usage, not drop in usage?


Wait until you find out about video games.


I bet RAM production will only increase to meet AI demand and there will be none left for you. Or me. Or anyone. Crucial is already going probably forever and I'm sure more will follow...


Yeah, OpenAI play was to cripple local AI "forever" (until the bubble pops), not to just deny RAM to competitors


> a relic from the days when everyone thought Drupal was the future (long time ago).

Drupal is the future. I never really used it properly, but if you fully buy into Drupal, it can do most everything without programming, and you can write plugins (extensions? whatever they're called...) to do the few things that do need programming.

> The Epilogue: That site has since been ported to WordPress, then ProcessWire, then rebuilt as a Node.js app. Word on the street is that some poor souls are currently trying to port it to Next.js.

This is the problem! Fickle halfwits mindlessly buying into whatever "next big thing" is currently fashionable. They shoulda just learned Drupal...


I'm not sure if you're serious or not, but while I never liked Drupal (even used to hate it once upon a time), I always liked the pragmatism surrounding it, reaching to the point of saving php code into the mysql database and executing from there.


> reaching to the point of saving php code into the mysql database and executing from there.

Wordpress loves to shove php objects into the database (been a good long while since I used it, don't remember the mechanism, it'd be the equivalent of `pickle` in python, only readable by php).

Not sure if they've improved it since I last dealt with it about 15 years ago, but at the time there was no way to have a full separated staging and production environment, lots of the data stored in the database that way had hardcoded domain names built into it. We needed to have a staging and production kind of set-up, so we ended up having to write a PHP script that would dump the staging database, fix every reference, and push it to production. Fun times.


There's implode() and explode() as well as serialize() and unseralize()

No idea what's used in wordpress, but back in D6 and before, it was common to see it when it would store multiple values for an instance.


> It burned through a mountain of tokens, but 10/10 - would generate tens of thousands of lines of useless code again.

Pardon me, and, yes, I know we're on HN, but I guess you're... rich? I imagine a single run like this probably burns through tens or hundreds of dollars. For a joke, basically.

I guess I understand why some people really like AI :-)


It was below 100$, but only after burning through the 20x max session limit.


The subsidized Codex/Claude subscriptions make it not so bad.


It's something you'd do once for shits and giggles, before realizing it's only funny once.


There are plenty of SMEs trapped into that future. :)


Agree, and it's also such a shame that none of the AI companies actually focus on that way of using AI.

All of them are moving into the direction of "less human involved and agents do more", while what I really want is better tooling for me to work closer with AI and be better at reviewing/steering it, and be more involved. I don't want "Fire one prompt and get somewhat working code", I want a UX tailored for long sessions with back and forth, letting me leverage my skills, rather than agents trying to emulate what I already can do myself.

It was said a long time ago about computing in general, but more fitting than ever, "Augmenting the human intellect" is what we should aim for, not replacing the human intellect. IA ("Intelligence amplification") rather than AI.

But I'm guessing the target market for such tools would be much smaller, basically would require you to already understand software development, and know what you want, while all AI companies seem to target non-developers wanting to build software now. It's no-code all over again essentially.


Is it any surprise that the cocaine cartels really want you to buy more cocaine, so they don't focus on its usefulness in pain relief and they refine it and cut it with the cheapest substances that will work rather than medical-grade reagents?

Same thing.


It's surprising that the ones who are producing the cocaine, don't try to find the best use of the cocaine, yes. But then these are VC-fueled businesses, then it all goes out the window, unfortunately. Otherwise they'd actually focus on usefulness, not just "usage" or whatever KPI they go by and share with their investors.


LLMs are drugs because they’re addictive and sap your abilities, is it?

(or generally: “Is the cocaine cartel comparison fair or unfair?”)


LLMs are fair to compare to cocaine because while there certainly could be ethical producers who follow reasonable laws and work to develop good uses, the market is completely dominated by organizations that don't.

And in my experience potheads offer you a toke and if you politely refuse, no problem at all. Coke addicts don't want to take no for an answer and insist that everybody should do it, they get so much more done, decisions are faster and better and what the hell is wrong with you if you don't want some?

So, the users are similar too.


Of course there are tools focusing on this. It takes a little getting used to how prevalent it is. My editor now can anticipate the next three lines of code I intend to write complete with what values I want to feed to the function I was about to invoke. It all shows up in an autocomplete annotation for me. I just type the first two or three characters and press tab to get everything exactly how I was about to type it in--including an accurate comment worded exactly in my voice.

Is that what you mean by IA?

For example, I type "for" and my editor guesses I want to iterate over the list that is the second argument of the function for which I am currently building the body. So it offers to complete the rest of the loop condition for me. Not only did it anticipate that I am writing a for loop. It figures out what I want to iterate over, and perhaps even that I want to enumerate the iteration so I have the index and the value. Imagine if I had written a comment to explain my intent for the function before I started writing the function body. How much better could it augment my intellect?


To be honest, I'm not quite sure what the ideal UX looks like yet. The AI assisted autocomplete is too little, but the idea of saying "Build X for purpose Y" is too high-level. Maintaining Markdown documents that the AI implements, also feels too high-level, but letting the human fully drive the implementation probably again too low-level.

I'm guessing the direction I'd prefer, would be tooling built to accept and be driven by humans, but allowed to be extended/corrected by AI, or something like that, maybe.

Maybe a slight contradiction, and very wish-washy/hand-wavey, but I haven't personally quite figured out what I think would be best yet either, what the right level actually is, so probably the best I could say right now :) Sorry!


The Markdown documents can be at any level. Just keep asking the AI to break each individual step in the plan down into substeps, then ask it to implement after you review. It's great for the opposite flow too - reverse engineering from working legacy code into mid-level and high-level designs, then proposing good refactors.


Yes, I'm talking about a UX that could handle that for the programmer instead, as an example. Zoom out a bit :)


I think this could be a decent interface with one addition, a way to comment on the completion being suggested. You could ask it for a different completion or to extend the completion, do something different, do a specific thing, whatever. An active way to "explain my intent" with the AI (besides leaving comments hinting at what you want) in addition to the passive completion system.


Which editor?

> Imagine if I had written a comment to explain my intent for the function before I started writing the function body.

The loon programming language (a Lisp) has "semantic functions", where the body is just the doc comment.


Still magical a few years in?

>Imagine if I had written a comment to explain my intent for the function before I started writing the function body.

This in particular is not dissimilar from opening a chat with a model and giving it a prompt as usual but then adding at the end:

Begin your response below:

  { func


>Agree, and it's also such a shame that none of the AI companies actually focus on that way of using AI.

This is because, regardless of the current state of things, the endgame which will justify all the upfront investment is autonomous, self-improving, self-maintaining systems.


I think it was Steve Jobs who said computers should be like a bicycle for the mind, I tend to agree


I love this Jobs quote for two reasons:

(1) It captures the ideal so well

(2) The bitter irony of how thoroughly pre-OS X Macintosh computers failed to live up to it

I feel like there's a similar dichotomy in LLM tools now


Yeah, Douglas Engelbart was also a huge believer in that, and I think from various stuff I've read from him and the Augmentation Research Center put me on this track of really agreeing with it.

"Bicycle for the mind", as always when it involves Jobs, sounds more fitting for the masses though, so thanks for sharing that :)


Agents are a "self-driving car for the mind". I don't enjoy or dislike driving, but lots of Americans love to drive. In the future they will lament their driving skills' decline.


We as the general population have consistently lost lots of skills from just 200 years back. Most likely we will not miss them (though coding used to be my hobby).

Though if apocalypse happens and all of our built tech goes away, we are in for a serious survival issu.


>Most likely we will not miss them

given that we've also lost the faculty to look at the past with anything other than contempt most people wouldn't even know what they miss. The little problem with losing the 'general cognition' department, just like broad social or cultural decline is that you lose the ability to even judge what you're losing, because the thing you just lost was doing the judging


Though if ~~apocalypse~~ war happens and all of our built tech goes away, we are in for a serious survival issue.

A lot closer than you think, too


"All of them are moving into the direction of "less human involved and agents do more", while what I really want is better tooling for me to work closer with AI and be better at reviewing/steering it, and be more involved."

I want less ambitious LLM powered tools than what's being offered. For example, I'd love a tool that can analyse whether comments have been kept up to date with the code they refer to. I don't want it to change anything I just want it to tell me of any problems. A linter basically. I imagine LLMs would be a good foundation for this.


Any terminal tool like Claude Code or Codex (I assume OpenCode too, but I haven't tried) can do it, by using as a prompt pretty much exactly what you wrote, and if it still wants to edit, just don't approve the tool calls.

One problem I've noticed is that both claude models and gpt-codex variants make absolutely deranged tool calls (like `cat <<'EOF' >> foo...EOF` pattern to create a file, or sed to read a couple lines), so it's sometimes hard to see what is it even trying to do.


"Any terminal tool like Claude Code or Codex (I assume OpenCode too, but I haven't tried) can do it, by using as a prompt pretty much exactly what you wrote, and if it still wants to edit, just don't approve the tool calls."

I'm sure it can. I'd still like a single use tool though.

But that's just my taste. I'm very simple. I don't even use an IDE.

edit: to expand on what I mean. I would love it if there was a tool that has conquered the problem and doesn't require me to chat with it. I'm all for LLMs helping and facilitating the coding process, but I'm so far disappointed in the experience. I want something more like the traditional process but using LLMs to solve problems that would be otherwise difficult to solve computationally.


I’m glad I’m not the only one who’s noticed these seemingly arbitrary calls to write files using the cat command instead of the native file edit capabilities of the agent.


Lot's of `echo "doing a thing" ; some --other command"` which then prompt the user to permit echo commands like this...


> Agree, and it's also such a shame that none of the AI companies actually focus on that way of using AI.

their valuations are replaced on getting rid of you entirely, along with everyone else

the "humans can use it to increase their productivity" is an interim step


I am learning rust myself and one of the things I definetly didn't want to do was let Claude write all the code. But I needed guidance.

I decided to create a Claude skill called "teach". When I enable it, Claude never writes any code. It just gives me hints - progressively more detailed if I am stuck. Then it reviews what I write.

I am finding it very satisfying to work this way - Rust in particular is a language where there's little space to "wing it". Most language features are interlaced with each other and having an LLM supporting me helps a lot. "Let's not declare a type for this right now, we would have to deal with several lifetime issues, let's add a note to the plan and revisit this later".


FYI: Claude has output styles, one of them is called `learning`. Instead of writing the code itself, it will add `TODO(human)` and comments to explain how to. Also adds `Insights` explaining concepts to you in its output.

This link also has a comparison to Skills further down.

https://code.claude.com/docs/en/output-styles#built-in-outpu...


I had a bash spaghetti code script that I wrote a few years ago to handle TLS certificates(generate CSRs, bundle up trust chains, match keys to certs, etc). It was fragile, slow, extremely dependent on specific versions of OpenSSL, etc.

I used Claude to rewrite it in golang and extend its features. Now I have tests, automatic AIA chain walking, support for all the DER and JKS formats, and it’s fast. My bash script could spend a few minutes churning through a folder with certs and keys, my golang version does a few thousand in a second.

So I basically built a limited version of OpenSSL with better ergonomics and a lot of magic under the hood because you don’t have to specify input formats at all. I wasn’t constrained by things like backwards compatibility and interface stability, which let me make something much nicer to use.

I even was able to build a wasm version so it can run in the browser. All this from someone that is not a great coder. Don’t worry, I’m explicitly not rolling my own crypto.


This is also how some of us use Claude despite what the haters say. You dont just go “build thing” you architect, review, refine, test and build.


It's how most of us are actually going to end up using AI agents for the foreseeable future, perhaps with increasing degrees of abstraction as we move to a teams-of-agents model.

The industry hasn't come up with a simple meme-format term to explain this workflow pattern yet, so people aren't excited about it. But don't worry, we'll surely have a bullshit term for it soon, and managers everywhere will be excited. In the meantime, we can just continue doing work with these new tools.


This is an opportunity to select some stupid words that you would like to hear repeated a million times. The process is like patiently nurturing a well-contained thing, so how about "egg coding"?


How about “engineering”?


I havent quite dealt with "teams of agents" yet outside of Claude Code itself spawning subagents, but I have some ideas as to how to achieve it in a meaningful way without giving a developer 10 claude code licenses, I think the real approach that makes more sense to me is to still have humans in the loop, but have their respective agents sync together and divide work towards one goal, but being able to determine which tasks are left to be worked one and tested. I do think for the foreseeable future you will need human validation for AI.


I thought the term was "agentic engineering"


I like "spec driven development" but I honestly don't care what you call it, just let me build things and leave me alone. :)


SDD is more like a subset. There are different ways to manage context in agentic engineering


I guess, I just know I force my agent to use a ticketing system like Beads (I made my own).


> SDD

Don’t do that! On a two-day-old term?!

No wonder we’re called gatekeepers.


Ok jeez, calm down. I am not shouldering all of the AI discourse lol.


^_^


Yeah that's the top contender at the moment. I think it's pretty good.



This does not spark joy.


I'm not sure there's going to be a term, because there's no difference from normal, good quality engineering. You iterate on design, validate results, prioritise execution. It's just that you hand over the writing code part. It's as boring as it gets.


> how some of us

Operative word being “some”. The issue is that too many aren’t doing it that way.

> You dont just go “build thing”

Tell that to the overwhelming majority of posters discussing vibe coding, including on HN.


Sure, but they're going to be stuck writing software for yesterday's problems. As our tools become more powerful, we're going to unlock new problems and expectations that would be impossible or impractical to solve with yesterday's tooling.


>Sure, but they're going to be stuck writing software for yesterday's problems

As long as they get paid for it (or have fun, if it's a personal project), they couldn't care less about that. Tomorrow's problems are overrated.


I suppose to some extent those people have always existed. The ones who would choose the most expedient solution.

The difference now is they can get much further along.


> despite what the haters say

Thinking people who disagree with you hate you or hate the thing you like is a recipe for disaster. It's much better to not love or hate things like this, and instead just observe and come to useful, outcome-based conclusions.


LLMs really do attract haters in the classic sense though. You'll find them in almost every thread on here.


They also attract grifters, frauds, conmen, snake oil peddlers, and every stripe of bullshit artist. I'm someone you probably would view as a hater, but I truly don't hate LLMs. I hate the lies. Projects like this are interesting, I wish there was a lot more of this and a lot less of the "trust me bro" stuff.


At this point, I say it attracts LLM Luddites.

I have a lot of sympathy for the Luddites after reading about the sudden loss of jobs and lifestyle, the new textile factories running roughshod over all sorts of ethical boundaries in pursuit of profit - like using child labour, and horrendous accident rates and working hours.

It's sad in a way the Luddites weren't more successful than they were, as there were decades of harm inflicted by these newfangled ideas before the kinks were ironed out. But some of the blame for that must be laid at the Luddites' feet - they were focused on preserving the past, choosing to remain in denial about the inevitability of the wave of change crashing down on them. They left the job of ironing out the kinks to the workers in the factories that displaced them.

With LLMs, we look be to taking the same broken path as the textile industry and its Luddites, sadly.


Look at any HN thread that has a project that uses AI in any way, shape or form. People quickly remark that it is slop, without even reviewing the code. If that's not blind hatred of AI, I don't know what is.

There's a huge distinction between Vibe Coding, and actual software engineers using AI tooling effectively. I vibe code for fun sometimes too, nothing wrong with it, helps me figure out how the model behaves in some instances, and to push the limits of what I understand.


Vibe Coding is like porn for programmers. It probably isn't good for you, and you'd probably be better off actually doing the thing yourself, but it feels good and satisfies our desires for instant gratification


Well, take for example, I have ideas I've had for years but no time for because by now the requirements are insane. I want to build a backend that could survive nuclear fallout type stuff. I braindump to Claude, watch it churn out my vision for the last 12 years, its insane.

There's other things too though: my ADD and my impostor syndrome don't matter to Claude, Claude just takes it all in, so as I keep brain dumping, it keeps chugging along. I don't have to worry a bout "can I really do this?" it just does it and I can focus on "what can I do to make it better" essentially.

For me it's beyond "porn coding" its basically fulfilling my vision that's been locked away for years but I've had no time to sit down and do it fully. I can tell Claude to do something, my kid comes up and asks me to go draw with them and I can actually just walk away and look at the output and refine.


I never said it doesn't have use cases (much like porn a lot of the arguments against are just fear mongering) just that it isn't as good as the real thing. I myself like yapping to an LLM about ideas to see how feasible they actually are before taking a crack at it


[flagged]


Comparing bitching about vibe coding to porn is like watching porn. You come away feeling better, but you didn't engage meaningfully with anyone else.

Who's next?


> People quickly remark that it is slop, without even reviewing the code.

I absolutely hate how "slop" has lost its meaning.

"AI slop" was supposed to mean poor-quality content that's obviously AI-generated. But the anti-AI crowd has co-opted it to mean any AI-generated content, regardless of quality. EDIT: Or even the quantity of AI. Expedition 33 had a ton of critical acclaim and ended up winning tons of awards, yet once it was discovered that AI was used to generate some placeholder art, of which NONE of it was actually used in the final product, some people started labeling the game as AI slop. It's utterly ridiculous.

So now, we can't have conversations about AI slop without starting off with making sure everyone is on the same page on what the term even means.

EDIT: "Vibe coding" is suffering a similar fate. If I use AI to write some code, and I examine the code to make sure it doesn't have any obvious bugs or security issues, is that still vibe coding?


We keep seeing this pattern over and over as well. Despite LLM companies' almost tangible desperation to show that they can replace software engineers, the real value comes from domain experts using the tools to enhance what they're already good at.


I'd guess this is a bet on which market is more lucrative:

* domain experts paying for tooling that will enhance their productivity

* capital/management class hoping to significantly replace domain experts

Software devs have been a famously tough market to sell tools to for a long time, so the better bet is B. Plus, the story on B is fantastic for fundraising; if there's a 10% chance that it checks out, you want some part of that as your capital portfolio.


I don't think they actually care if it ever materializes. They just have to sell execs on it. As long as they can the exec will sell it to their higher ups, mostly by just flat out lying about it.

I see it all the time at the Director and VP level. Once big money is on the line, there are no failures, just "opportunities for strategic realignment"


I had a script in another language. It was node, took up >200MB of RAM that I wanted back. "claude, rewrite this in rust". 192MB of memory returned to me.


Solving the big RAM shortage one prompt at a time.


This is sad to see. Node was originally one of the memory efficient options – it’s roots are solving the c10k problem. Mind sharing what libraries/frameworks you were using?


It was an express server. I don't think c10k is particularly interesting since it mostly just involves having cooperating scheduling. Doesn't really impact flat memory overhead etc. I mean, the binary for node alone, without any libraries etc, is larger than the produced rust binary.


Before Node/libuv holding open connections was really expensive resource-wise, dropping that cost to <30KB per connection was massive.

The node binary is huge due to inclusion of i18n libraries to support the native APIs, and should have little impact on memory consumption. There is a way to opt out.


> it’s roots are solving the c10k problem

The C10K was a long-solved problem when Node came out; just it was not for what 99% of people used at the time, i.e. PHP/Python/Ruby.


I used to have a bunch of bespoke node express server utilities that I liked to keep running in the background to have access to throughout the day but 40-50mb per process adds up quickly.

I’ve been throwing codex at them and now they’ve all been rewritten in Go - cut down to about 10mb per process.


I haven’t done a ton of porting. And when I did, it was more like a reimplementation.

> We’ve verified that every AST produced by the Rust parser is identical to the C++ one, and all bytecode generated by the Rust compiler is identical to the C++ compiler’s output.

Is this a conventional goal? It seems like quite an achievement.


My company helps companies do migrations using LLM agents and rigid validations, and it is not a surprising goal. Of course most projects are not as clean as a compiler is in terms of their inputs and outputs, but our pitch to customers is that we aim to do bug-for-bug compatible migrations.

Porting a project from PHP7 to PHP8, you'd want the exact same SQL statements to be sent to the server for your test suite, or at least be able to explain the differences. Porting AngularJS to Vue, you'd want the same backend requests, etc..


It’s a very good way of getting LLMs to work autonomously for a long time; give it a spec and a complete test suite, shut the door; and ask it to call you when all the tests pass.


> Coding assistants are also really great at porting from one language to the other

No, they are quite terrible at doing that.

They may (I guess?) produce code that compiles, but they will, almost certainly not produce the appropriate combination of idioms and custom abstractions that may the code "at home" in the target language.

PS - Please fix your blockquote... HN ignores single linebreaks, so you have to either using pairs of them, or possibly go with italicization of the quoted text.


This is the way. This exact workflow is my sweet spot.

In my coding agent std::slop I've optimized for this workflow https://github.com/hsaliak/std_slop/blob/main/docs/mail_mode... basically the idea is that you are the 'maintainer' and you get bisect safe, git patches that you review (or ask a code reviewer skill or another agent to review). Any change re-rolls the whole stack. Git already supports such a flow and I added it to the agent. A simple markdown skill does not work because it 'forgets'. A 'github' based PR flow felt too externally dependent. This workflow is enforced by a 'patcher' skill, and once that's active, tools do not work unless they follow the enforced flow.

I think a lot of people are going to feel comfortable using agents this way rather than going full blast. I do all my development this way.


This is broadly how I worked when I was still using chat instead of cli agents for LLM support. The downside, I feel, is that unless this is a codebase / language / architecture I do not know, it feels faster to just code by hand with the AI as a reviewer rather than a writer.


your patch queue approach is very clever. Solves a huge tech debt poblem with llm code gen. Should work with jujitsu too probably.

Would be curious to see more about how you save tokens with lua too.

Do you blog?


Thanks for your interest in this work - I do not blog(maybe I should?) but i have posted a bit more on X about this work.

- A bit more on mail mode https://x.com/hsaliak/status/2020022329154420830

- on the Lua integration https://x.com/hsaliak/status/2022911468262350976 (I've since disabled the recursion, not every code file is long and it seems simpler to not do it), but the rest of it is still there

- hotwords for skill activation https://x.com/hsaliak/status/2024322170353037788

Also /review and /feedback. /feedback (the non code version) opens up the LLM's last response in an editor so you can give line by line comments. Inspired by "not top posting" from mailing lists.


I quit x so cant read beyond toplevel links. I subscribed to your tool on github, would appreciate blog-posts-in-release notes to keep up with future developments. Will try the tool. Rare to find something new among ai hype, thank you.


Fair enough. I'll find a way to publish some of this. I try to cover most of the information in the docs/ folder, and keep it up to date. Blog posts in release notes is a good idea!


I did this exact same thing for porting a compiler from one language to another with Codex. I run tests at every step, and verified that bytecode output was byte-for-byte identical. I was very impressed at the results, and this is coming from someone who's always pointing out issues with AI programming.


> This was human-directed, not autonomous code generation.

All my vibe coded projects are human directed, unless explicitly stated otherwise


I am having immense success with the latest models developing a personal project that I open sourced and then got burned off by.I can't write anymore by hands but I do enjoy writing prompts with my voice.I have been shipping the best code the project has ever seen.The revolution is real.


Coding assistants are great at pattern matching and pattern following. This is why it’s a good idea to point them at any examples or demos that come with the libraries you want to use, too.


Quite good. I ported my codebase from Go to Rust in a fraction of the time it would have taken me to rewrite it.


If every AST is isomorphic, why bother? Don't you miss getting some of the advantages of Rust?


How does he solve the Fruit of the Poison Tree problem? For all he know, his LLMs included a bunch copyrighted or patented code throughout the codebase. How is he going to convince serious people that this port is not just a transformation of an _asset_ into a _liability_?

And you might say that this is a hypothetical problem, one that is not practically occurring. Well, we had a similar problem like this in the recent past, that LLMs are close to _making actual_. When it comes to software patents, they were considered a _hypothetical_ problem (i.e. nobody is going to bother suing you unless you were so big that violating a patent was a near certainty). We were instructed (at pretty much all jobs), to never read patents, so that we cannot incriminate ourselves in the discovery process.

That is going to change soon (within a year). I have friend, whom I won't name, who is working on a project, using LLMs, to discover whether software (open source and proprietary) is likely to be violating a software patent from a patent database. And it is designed to be used, not by programmers, but by law firms, patent attorneys, etc. Even though it is not marketed this way, it is essentially a target acquisition system for use by patent trolls. It is hard for me to tell if this means that we will have to keep ignoring patents for that plausible deniability, or if this means that we will have to become hyper informed about all patents. I suppose, we can just subscribe to the patent-agent, and hope that it guides the other coding agents into avoiding the insertion of potentially infringing code.

(I also have a friend who built a system in 2020 that could translate between C++ and Python, and guarantee equivalent results, and code that looks human-written. This was a very impressive achievement, especially because of how it guarantees the equivalence (it did not require machine-learning nor GPUs, just CPUs and some classic algorithms from the 80s). The friend informs me that they are very disheartened to see that now any toddler with a credit card can mindlessly do something similar, invalidating around a decade of unpublished research. They tell me that it will remain unpublished, and if they could go back in time, they would spend that decade extracting as much surplus from society as possible, by hook or by crook (apparently they had the means and the opportunity, but lacked the motive); we should all learn from my friend's mistake. The only people who succeed are, sadly, perversely, those who brazenly and shamelessly steal -- and make no mistake, the AI companies are built on theft. When millionaires do it, they become billionaires -- when Aaron Swartz does it, he is sentenced to federal prison. I'm not quite a pessimist yet, but it really is saddening to watch my friend go from a passionate optimist to a cold nihilist.).


One or both of you have the story very wrong.

If there was value (the guarantees) to this tech he buried a bunch of time in, he should be wrapping a natural language prompt around it and selling it.

Not even the top providers are giving any sort of tangible safety or reliability guarantees in the enterprise…


Lol, and this line:

> Geminin 3.1 Pro can comprehend vast datasets

Someone was in a hurry to get this out the door.


I'm glad someone else is finally saying this, I've been mentioning this left and right and sometimes I feel like I'm going crazy that not more people are noticing it.

Gemini can go off the rails SUPER easily. It just devolves into a gigantic mess at the smallest sign of trouble.

For the past few weeks, I've also been using XML-like tags in my prompts more often. Sometimes preferring to share previous conversations with `<user>` and `<assistant>` tags. Opus/Sonnet handles this just fine, but Gemini has a mental breakdown. It'll just start talking to itself.

Even in totally out-of-the-ordinary sessions, it goes crazy. After a while, it'll start saying it's going to do something, and then it pretends like it's done that thing, all in the same turn. A turn that never ends. Eventually it just starts spouting repetitive nonsense.

And you would think this is just because the bigger the context grows, the worse models tend to get. But no! This can happen well below even the 200.000 token mark.


Flash is (was?) was better than Pro on these fronts.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: