In the six years you are using your computer, do you ever expect to run into versioning issues and conflicts? Homebrew packages conflicting with local packages, something you compile give needs a different python/ruby/node/rust/whatever version that you have locally installed, you want to quickly try out a new package or upgrade without changing your system but have the option of rolling back safely, need to quickly install a database, want to try out a new shell and shell config but don't brick your system and have the option to roll back, etc. Nix gives you all of that and more for a one-time setup cost. Your argument is correct only if you expect to never change anything on your computer for the 6 years. But if I think about how often I have fought with homebrew or some kind of versioning/path/binary conflicts in the past then the investment in nix has paid off exponentially.
It's also about peace of mind like you said. Before nix I sometimes felt anxiety installing or upgrading certain things on my computer. "Will this upgrade break stuff?" - and often it did and I'd have to spend the next few hours debugging. With nix I don't worry about any of that anymore.
> Homebrew packages conflicting with local packages, something you compile give needs a different python/ruby/node/rust/whatever version that you have locally installed, you want to quickly try out a new package or upgrade without changing your system but have the option of rolling back safely, need to quickly install a database, want to try out a new shell and shell config but don't brick your system and have the option to roll back, etc.
Couldn't pretty much all of that be addressed using containers? Keeping your base system clean does sound wonderful, but eg distrobox containers sound more approachable - you're using all the same commands that you normally would, apps are in an environment much closer to what they probably expect. You can still roll back using snapshots, which you can configure to be automatically created on system updates. If you want an atomic rollback guarantee, and a strong reminder not to mess with the base system, you can use an immutable distro (talking about Linux, not macOS here). The one big advantage that I see from nix is reproducibility. But it's not clear how desirable that is for a desktop use case. You may actually want different software on different machines. Having an overview of all the changes you made to your system sounds cool, but I'm not sure it's worth the effort that comes with nix. I'm worried that after 8 months I'll decide it's too much hassle, like many commenters seem to do, and end up switching to a simpler system with dotfiles and containers, wishing I'd done that from the start.
That's mostly solved with env managers for python/ruby/node/..., takes at most a few minutes to fully set up and learn, and doesn't get constantly broken by macOS updates.
Even for things like trying out a new shell you can temporarily move the dotfiles somewhere and restore them back and it still takes less time than converting everything to Nix.
But now you’re stuck with Python. Nix enables trivially simple dev environments that are completely heterogenous. This gives you a powerful form of freedom because it literally opens up the entire software universe to your dev environment in a confidence inspiring way. Not to mention things like parameterising anything you use reliably and setting up environment variables, shell scripts, database service whatever you want. Also integrates with tools such as UV really well. Yes, the language is terse and difficult but once you know it, it’s liberating, and makes you a better software developer in my opinion because you now have a high-end full workshop rather than a small toolbox.
This is my feeling too. Nix is a relatively high time investment for a tool that tries to do everything, when you might not need or want everything and using the specific language’s tooling is more than sufficient and quicker. It takes a few minutes to install and do `uv sync`, or `nvm install`, or whatever, on a repository on a new computer, and it just works. Until Nix gets there, and I’m skeptical it will because of the “purist” mindset a lot of people in the community have, it’s hard to justify it.
I think the comparison is "X-as-code", like with Terraform and other tools.
If you just want a throwaway VM, it's straightforward to create one through the UI cloud console. Whereas, terraform is nevertheless still a useful tool to use to manage VMs.
For stuff like installing development dependencies.. it's maybe not difficult to copy-and-paste instructions from a readme, but solutions like devcontainers or Nix's development shells can be useful even if costing more overhead.
Of course. I wouldn’t say that Nix is a tool without much use or merit, because setting up development environments can be a huge pain and I understand why some people would use it and prefer it.
My biggest complaint is what I mentioned above: it’s trying to be everything for package management, and adds a lot of complexity (and I disagree that it’s always necessary/inherent) compared to just installing a tool and sometimes upgrading it. That complexity often means I have to debug it rather than the tool that I want to - I might have to debug Nix instead of Node, which is not always straightforward. In my limited experience Nix got in my way more than I’d like, and in ways I didn’t expect or want to deal with, and until it’s as seamless as something like Homebrew or apt, it’ll be a hard sell.
Fully spot on, I don't get what is that hard to set a couple of environment variables, and mayby symbolic links, depending on the OS and language being used.
A simple UNIX script or PowerShell utility takes care of it.
None of the ones I have used during the last decades has ever grown to more than like 20 lines of code, minus comments.
until you need to start combining things. Docker is conceptually a VM the encapsulates everything nicely, but it ironically doesn't "compose" nearly as well as nix flakes or shells. With Nix you start out with a base env and can trivially extend it hierarchically and jump up and down the tree super easily, and without having to roll your own microservice architecture each time just to get stuff to work together.
Docker OTOH composes whole services nicely: if my project needs a redis cache and postgres instance, I don't have to faff about with local ports, traefik can pick up my web server, and so on. I use a flake to create and lock a local development toolchain, but it's no help in managing the services.
One thing I haven't tried yet is building a container from a flake, which would have obvious benefits for reproducibility. Still don't think it would help with service orchestration though.
I think what it comes down to, and where many people get confused, is separating the technology itself from how we use it. The technology itself is incredible for learning new skills, but at the same time it incentivizes people to not learn. Just because you have an LLM doesn't mean you can skip the hard parts of doing textbook exercises and thinking hard about what you are learning. It's a bit similar to passively watching youtube videos. You'd think that having all these amazing university lectures available on youtube makes people learn much faster, but in reality in makes people lazy because they believe they can passively sit there, watch a video, do nothing else, and expect that to replace a classroom education. That's not how humans learn. But it's not because youtube videos or LLMs are bad learning tools, it's because people use them as mental shortcut where they shouldn't.
I fully agree, but to be fair these chatbots hack our reward systems. They present a cost/benefit ratio where for much less effort than doing it ourselves we get a much better result than doing it ourselves (assuming this is a skill not yet learned). I think the analogy to calculators is a good one if you're careful with what you're considering: calculators did indeed make people worse at mental math, yet mental math can indeed be replaced with calculators for most people with no great loss. Chatbots are indeed making people worse at mental... well, everything. Thinking in general. I do not believe that thinking can be replaced with AI for most people with no great loss.
You absolutely need to spend money in PoE to buy stash tabs. It's basically mandatory if you play regularly. The difference to most dark patterns is that the spending has a very low cap. Once you've spent $50 or so on stash tabs you are set forever and never need to spend again. So it's not so different from buying a $50 game, just that you get to try it out for free first.
$50 is exaggerating it nowadays. With async trade you could buy a single merchant tab to gain access to trade (stuff sells pretty quick with async trade!), and maybe a currency and scarab tab for the bare minimum convenience. Around $20 and you've got yourself a meaty beast of a game.
It doesn't feel off to me because that's the exact experience I've had as well. So it's unsurprising to me that many other people share that experience. I'm sure there is a bunch of paid promotion going on for all kinds of stuff on HN (especially what gets onto the front page), but I don't think this is one of those cases.
Oh cool, can you share concrete examples of times codex out performed Claude Code? I’m my experience both tools needs to be carefully massaged with context to fulfill complex task.
I don't really see how examples are useful because you're not going to understand the context. My prompt may be something like "We recently added a new transcription backend api (see recent git commits), integrate it into the service worker. Before implementing, create a detailed plan, ask clarifying questions, and ask for approval before writing code"
Nobody has to give you examples. People can express opinions. If you disagree, that’s fine but requesting entire prompt and response sets is quite demanding. Who are you to be that demanding?
Let's call it the skeptical public? We've been listening to a group of people rave about how revolutionary these tools are, how they're able to perform senior level developer work, how good their code is, and how they're able to work autonomously through the use of sub-agents (i.e. vibe coding), without ever providing evidence that would support any of those grandiose claims.
But then I use these tools myself[1] and I speak to real developers who have used them and our evaluation centers around lukewarm, e.g. good at straightforward, junior level tasks, or good for prototyping, or good for initially generating tests, or good for answering certain types of questions, or good for one-off scripts, but approximately none of them would trust these LLMs to implement a more complex feature like a mid-level or senior developer would without very extensive guidance and hand-holding that takes longer than just doing it ourselves.
Given the overwhelming absence of evidence, the most charitable conclusion I can come to is that the vast majority of people making these claims have simply gone from being 0.2X developers to being 0.3X developers who happen to generate 5X more code per unit of time.
Context engineering is a critical part of being able to use the tool. And it's ok to not understand how to use a new tool. The different models combined with different stacks require different ways of grappling with the technology. And it all changes! It sucks that you've tried it for your stack (Elixir, whatever that is) in your way and it was disappointing.
To me, the tool inherently makes sense and vibes with my own personality. It allows me to write code that I would otherwise procrastinate on. It allows me to turn ideas into reality, so much faster.
Maybe you're just hyper focused on metrics? Productivity, especially when dealing with code, is hard to quanitfy. This is a new paradigm and so it's also hard to compare apples to oranges. Does this help?
So your take is that every real software developer I know is simply bad at using this magical tool that performs on the level of mid-senior level software engineer in the hands of a few chosen ones? But the chosen ones never build anything in public where it can be observed, evaluated, and critiqued. How unfortunate is that?
The people I talked to use a wide variety of environments and their experience is similar across the board, whether they're working in Nodejs, React, Vue, Ruby, PHP, Java, Elixir, or Python.
> Productivity, especially when dealing with code, is hard to quanitfy.
Indeed, that's why I think most people claiming these obscene benefits are really bad at evaluating their own performance and/or started from a really low baseline.
I always think back to a study I read a while ago where people without ADHD were given stimulant medication and reported massive improvements in productivity but objective measurements showed that their real-world performance was equal to, or slightly lower than their baseline.
I think it's very relevant to the psychology behind this AI worship. Some people are being elevated from a low baseline whilst others are imagining the benefits.
People do build in public from vibe-coding, absolutely. This tells me that you have not done your research and just gone off of general guesses or pessimism/frustration from not knowing how to use the tool. The easiest way to be able to find this on Github is to look for where Claude is a contributor. Claude will tag itself in the PR or pushes. Another easy way to that I've seen come up for this is there is a whole "BuildInPublic" tag in the Threads app which has been inundated with Vibe coding. While these might not be in your algorithm, they do exist. You'll be able to see that while there is a lot of crud that there are also products being made are actually versatile, complex, and completely vibe-coded. Most people are not making up these stories. It's very real.
Of course people vibe-code in public - I was clear that I wanted to see evidence of these amazing productivity improvements. If people are building something decent but it takes them 3 or 4 times as long as it would take me, I don't care. That's great for them but it's worthless to me because it's not evidence of a productivity increase.
> there are also products being made are actually versatile, complex, and completely vibe-coded.
Which ones? I'm looking for repositories that are at least partially video-documented to see the author's process in action.
I'm not saying it is, but if ANYTHING was the exact combination of prerequisites to be considered paid promotion on HN, this is the type of comment it would be.
So, let’s see if I get this straight. A highly identifiable person whose company sells a security product is the ideal shill? That doesn’t make any sense whatsoever. On the other hand, someone with a different opinion makes complete sense.
Lebron James endorses KIA. Multi-billion dollar companies can afford and benefit from highly identifiable people so I don't really think that argument makes it any less likely to be an endorsement.
Interesting, my experience has been the opposite. I've been running Codex and Sonnet 4.5 side by side the past few weeks, and Codex gives me better results 90% of the time, pretty much across all tasks. Where Claude really shines is that it's much faster than codex. So if I know exactly what I want or if it's a simpler task I feel comfortable giving it to Claude because I don't want to wait for Codex to work through it. Claude cli is also a much better user experience than codex cli. But Codex gets complex things right more consistently.
My experience is similar. So most of the work I do with Claude as I like the small tasks / fast iteration pair coding experience. When I need to investigate some issues I let Codex handle it, and check back in 10 minutes when it's ready. But Codex is way too slow for the pair programming style of work.
Also, most of the time Codex opts to use Python to edit files. Those edits are unreviewable so it's even less interactive, you just have to let it finish and check the outcome.
I wish this didn't have AI in it. I've been looking for a Jupyter alternative that is pure python and can be modified from a regular text editor. Jupytext works okay, but I miss the advanced Jupyter features. But I really don't want to deal with yet another AI assistant, especially not a custom one when I'm already using Claude/etc from the CLI and I want those agents to help me edit the notebooks.
Take out all the AI stuff and I'd give it a try. I use AI coding agents as my daily driver, but I really don't need this AI enshittification in every tool/library I'm using.
Reading the article, I don't think it has AI. They've just made the tools in a way that AI assistants can also use them, and so fix linting errors without anyone needing to fine-tune the LLM on the syntax.
That's actually pretty slick. I've been wondering how we could avoid blocking innovation in programming languages because of the death cycle of "no training data on language -> LLM can't learn language -> Assistant can't code language -> nobody uses language -> no training data on language".
Yeah, reading the docs it seems you are right. The landing page mentions AI-native at the very top and all over the place, so I got the wrong impression that it's somehow tightly coupled to an AI integration. But looks like it's optional.
> But I really don't want to deal with yet another AI assistant, especially not a custom one when I'm already using Claude/etc from the CLI and I want those agents to help me edit the notebooks.
So funny story- you can use exactly the same CLI tools in your notebook. Zed built out the ACP spec [1] which lets Claude Code go anywhere that implements it (as of Oct 2nd; emacs, vim, zed and marimo [2])
I hate how much I lean into VSCode, but the Python interactive mode gets you a really good live coding environment. Instead of Jupyter cells, you have a regular .py file with chunks of code prefixed with a `# %%`. VSCode gives you a similar experience to a notebook, with the same controls (Run Above Cells, Restart and Run All, etc). So something like
Since it is a regular .py file all of your existing tooling will work with it. The one thing you lose vs a Jupyter notebook is saved output. I mostly use these .py files, but have a few .ipynb notebook files for when I want to commit the output from some important task.
That sounds more like an organizational problem. If you are an employee that doesn't care about maintainability of code, e.g. a freelancer working on a project you will never touch again after your contract is over, your incentive has always been to write crappy code as quickly as possible. Previously that took the form of copying cheap templates, copying and pasting code from StackOverflow as-is without adjustments, not caring about style, using tools to autogenerate bindings, and so on. I remember a long time ago I took over a web project that a freelancer had worked on, and when I opened it I saw one large file of mixed python and HTML. He literally just copied and pasted whole html pages into the render statements in the server code.
The same is true for many people submitting PRs to OSS. They don't care about making real contributions, they just want to put something on their resume.
AI is probably making it more common, but it really isn't a new issue, and is not directly related to LLMs.
> freelancer working on a project you will never touch again after your contract is over, your incentive has always been to write crappy code as quickly as possible
I don't agree with this at all. As a freelancer your incentive is to extend the contract or be remembered as the best contractor when the client needs help again. You should be the expert that improves the codebase and development practices, someone the internals can learn from.
>If you are an employee that doesn't care about maintainability of code, e.g. a freelancer working on a project you will never touch again after your contract is over, your incentive has always been to write crappy code as quickly as possible.
Yes, this is it. The idea that LLMs somehow write this deceptive code that magically looks right but isn't is just silly. Why would that be the case? If someone finds they are good at writing code (hard to define of course but take a "measure" like long term maintainability for example) but they fail to catch bad code in review it is just an issue with their skill. Reviewing code can be trained just as writing code can be. A good first step might be to ask oneself: "how would I have approached this".
> Beyond this, if you’re working on novel code, LLMs are absolutely horrible at doing anything. A lot of assumptions are made, non-existent libraries are used, and agents are just great at using tokens to generate no tangible result whatsoever.
Not my experience. I've used LLMs to write highly specific scientific/niche code and they did great, but obviously I had to feed them the right context (compiled from various websites and books convered to markdown in my case) to understand the problem well enough. That adds additional work on my part, but the net productivity is still very much positive because it's one-time setup cost.
Telling LLMs which files they should look at was indeed necessary 1-2 years ago in early models, but I have not done that for the last half year or so, and I'm working on codebases with millions of lines of code. I've also never had modern LLMs use nonexistent libraries. Sometimes they try to use outdated libraries, but it fails very quickly once they try to compile and they quickly catch the error and follow up with a web search (I use a custom web search provider) to find the most appropriate library.
I'm convinced that anybody who says that LLMs don't work for them just doesn't have a good mental model of HOW LLMs work, and thus can't use them effectively. Or their experience is just outdated.
That being said, the original issue that they don't always follow instructions from CLAUDE/AGENT.md files is quite true and can be somewhat annoying.
> Not my experience. I've used LLMs to write highly specific scientific/niche code and they did great, but obviously I had to feed them the right context (compiled from various websites and books convered to markdown in my case) to understand the problem well enough. That adds additional work on my part, but the net productivity is still very much positive because it's one-time setup cost.
I've been genuinely surprised how well GPT5 does with rust! I've done some hairy stuff with Tokio/Arena/SIMD that I thought I would have to hand hold it through, and it got it.
Yeah, it has been really good in my experience. I've done some niche WASM stuff with custom memory layouts and parallelism and it did great there too, probably better than I could've done without spending several hours reading up on stuff.
It's pretty good at Rust, but it doesn't understand locking. When I tried it. It just put a lock on everything and then didn't take care to make sure the locks were released as soon as possible. This severely limited the scalability of the system it produced.
But I guess it passed the tests it wrote so win? Though it didn't seem to understand why the test it wrote where the client used TLS and the server didn't wouldn't pass and required a lot of hand holding along the way.
I've experienced similar things, but my conclusion has usually been that the model is not receiving enough context in such cases. I don't know your specific example, but in general it may not be incorrect to put an Arc/Lock on many things at once (or using Arc isntead of Rc, etc) if your future plans are parallelize several parts of your codebase. The model just doesn't know what your future plans are, and in errs on the side of "overengineering" solutions for all kinds of future possibilities. I found that this is a bias that these models tend to have, many times their code is overengineered for features I will never need and I need to tell them to simplify - but that's expected. How would the model know what I do and don't need in the future without me giving all the right context?
The same thing is true for tests. I found their tests to be massively overengineered, but that's easily fixed by telling them to adopt the testing style from the rest of the codebase.
Rust has been an outlier in my experience as well. I have a pet theory that it is due to rust code that's been pushed to github generally compiles. And if it compiles it generally works.
I often use that time to spec out a future task. Either by going through Github issues, doing some research and adding details, or by spinning up another codex/claude session to create a detailed design document for a future task and iterating on that. So one agent is coding while another is helping me to spec out future work. So when the coding agent is done I can immediately start on the next task with a proper spec, reducing margin for error.
Reading HN I seem to be in the minority but AI has made programming a lot more fun for me. I've been an engineer for nearly 25 years and 95% of the work is rather mindless boilerplate. I know exactly what I need to do next, it just takes time and iteration.
The "you think about the problem and draw diagrams" part of you describe probably makes up less than 5% of a typical engineering workflow, depending on what you work on. I work in a scientific field where it's probably more than for someone working in web dev, but even here it's very little, and usually only at the beginning of a project. Afterwards it's all about iteration. And using AI doesn't change that part at all, you still need to design the high level solution for an LLM to produce anything remotely useful.
I never encountered the problem of not understanding details of the AI's implementation that people here seem to describe. I still review all the code and need to ask the LLM to make small adjustments if I'm not happy with it, especially around not-so-elegant abstractions.
Tasks that I actively avoided previously because they seemed like a hassle, like large refactorings, I no longer avoid now because I can ask an AI to do most of it. I feel so much productive and work is more satisfying because I get to knock out all these chores that I had resistance to before.
Brainstorming with an AI about potential solutions to a hard problem is also more fun for me, and more productive, than doing research the old ways. So instead of drawing diagrams I now just have conversations.
I can't say for certain whether using LLMs has made me much more productive (overall it likely has but for certain tasks it hasn't), but it definitely has made work more fun for me.
Another side effect has been that I'm learning new things more frequently when using AI. When I brainstorm solutions with an AI or ask for an implementation, it sometimes uses libraries and abstractions I have not seen before, especially around very low level code that I'm not super familiar with. Previously I was much more likely to use or do things the one way I know.
I said more in another comment. But after 20+ years in the industry as of 2018 and and before that 10 years as a hobbyist, coding had become a grind. I started liking solving business problems, talking to customers, mentoring etc and even the high level architect of the code.
AI has made a world of difference. I don’t use agents, I build the system up using ChatGPT as a junior developer with hand holding.
It's also about peace of mind like you said. Before nix I sometimes felt anxiety installing or upgrading certain things on my computer. "Will this upgrade break stuff?" - and often it did and I'd have to spend the next few hours debugging. With nix I don't worry about any of that anymore.
reply