I skimmed over it, and didn’t find any discussion of: - Pull requests - Merge re...

bthornbury · 2026-02-06T02:33:27 1770345207

Either really comprehensive tests (that you read) or read it. Usually i find you can skim most of it, but like in core sections like billing or something you gotta really review it. The models still make mistakes.

mattmanser · 2026-02-06T09:31:07 1770370267

You can't skim over AI code.

For even mid-level tasks it will make bad assumptions, like sorting orders or timezone conversions.

Basic stuff really.

You've probably got a load of ticking time bomb bugs if you've just been skimming it.

bthornbury · 2026-02-06T21:02:52 1770411772

> got a load of ticking time bomb bugs

Lots and lots of tests!

QuiEgo · 2026-02-06T05:04:44 1770354284

You read it. You now have an infinite army of overconfident slightly drunken new college grads to throw at any problem.

Some times you’re gonna want to slowly back away from them and write things yourself. Sometimes you can farm out work to them.

Code review their work as you would any one else’s, in fact more so.

My rule of thumb has been it takes a senior engineer per every 4 new grads to mentor them and code review their work. Or put another way bringing on a new grad gets you +1 output at the cost of -0.25 a senior.

Also, there are some tasks you just can’t give new college grads.

Same dynamic seems to be shaping up here. Except the AI juniors are cheap and work 24*7 and (currently) have no hope of growing into seniors.

kaibee · 2026-02-06T05:34:46 1770356086

> Same dynamic seems to be shaping up here. Except the AI juniors are cheap and work 24*7 and (currently) have no hope of growing into seniors.

Each individual trained model... sure. But otoh you can look at it as a very wide junior with "infinite (only limited by your budget)" willpower. Sure, three years ago they were GPT-3.5, basically useless. And now they're Opus 4.6. I wonder what the next few years will bring.

saghm · 2026-02-07T15:27:15 1770478035

I've only recently started trying out using LLMs to help me write code (as in, within the last two weeks), and the workflow that makes the most sense to me is to not let the LLM anywhere close to PRs/MRs/CRs, or even version control at all. I've found it useful to give it a fairly constrained task (something that might be a 100-200 line modification of my current code), literally watch the output of Claude Code's "thinking" as it goes to potentially interrupt it if it's going down the wrong path or if it gives me a better idea, wait for it to present the code, and then read through all of it to make sure it's what I want. After making whatever small changes I might want, I commit, and then move onto the next thing. So far, this has pretty much all been for personal side projects outside of work, so there is no code review, but approaching it from the standpoint that the goal is produce the same code and version control history I would want if I created it by hand and just using the LLM as way of automating the typing, I've been pretty surprised that it's already been a net gain in efficiency for a lot of things I've been working on. Ideally, the code I'm generating shouldn't be distinguishable from what I'm already writing, because I would change it if I saw that it was. At that point, either it's high-quality enough to be merged, or it's not and should be rejected, and that's already how things work in the first place. If someone makes an MR that their coworkers find sloppy and annoying to review, there needs to be pushback, and how it was generated should be irrelevant if everyone is on the same page about where the bar for quality is and is acting in good faith. (If you're working in an environment where there's no bandwidth to care about quality or people are acting in bad faith, LLM code will probably not be much of an improvement, but you're also probably going to have a bad time regardless, and unfortunately I don't think there's a magic bullet for fixing that).

AloysB · 2026-02-06T03:21:23 1770348083

Give it a read, he mentions briefly how he uses for PR triages and resolving GH issues.

He doesn't go in details, but there is a bit:

> Issue and PR triage/review. Agents are good at using gh (GitHub CLI), so I manually scripted a quick way to spin up a bunch in parallel to triage issues. I would NOT allow agents to respond, I just wanted reports the next day to try to guide me towards high value or low effort tasks.

> More specifically, I would start each day by taking the results of my prior night's triage agents, filter them manually to find the issues that an agent will almost certainly solve well, and then keep them going in the background (one at a time, not in parallel).

This is a short excerpt, this article is worth reading. Very grounded and balanced.

datsci_est_2015 · 2026-02-06T04:57:07 1770353827

Okay I think this somewhat answers my question. Is this individual a solo developer? “Triaging GitHub issues” sounds a bit like open source solo developer.

Guess I’m just desperate for an article about how organizations are actually speeding up development using agentic AI. Like very practical articles about how existing development processes have been adjusted to facilitate agentic AI.

I remain unconvinced that agentic AI scales beyond solo development, where the individual is liable for the output of the agents. More precisely, I can use agentic AI to write my code, but at the end of the day when I submit it to my org it’s my responsibility to understand it, and guarantee (according to my personal expertise) its security and reliability.

Conversely, I would fire (read: reprimand) someone so fast if I found out they submitted code that created a vulnerability that they would have reasonably caught if they weren’t being reckless with code submission speed, LLM or not.

AI will not revolutionize SWE until it revolutionizes our processes. It will definitely speed us up (I have definitely become faster), but faster != revolution.

kaibee · 2026-02-06T05:44:37 1770356677

> Guess I’m just desperate for an article about how organizations are actually speeding up development using agentic AI. Like very practical articles about how existing development processes have been adjusted to facilitate agentic AI.

They probably aren't really. At least in orgs I worked at, writing the code wasn't usually the bottleneck. It was in retrospect, 'context' engineering, waiting for the decision to get made, making some change and finding it breaks some assumption that was being made elsewhere but wasn't in the ticket, waiting for other stakeholders to insert their piece of the context, waiting for $VENDOR to reply about why their service is/isn't doing X anymore, discovering that $VENDOR_A's stage environment (that your stage environment is testing against for the integration) does $Z when $VENDOR_B_C_D don't do that, etc.

The ecosystem as a whole has to shift for this to work.

djhn · 2026-02-06T07:40:15 1770363615

The author of the blog made his name and fortune founding Hashicorp, makers of Vagrant and Terraform among other things. Having done all that in his twenties he retired as the CTO and reappeared after a short hiatus with a new open source terminal, Ghostty.

datsci_est_2015 · 2026-02-06T14:07:47 1770386867

I had a bit of an adjustment of my beliefs since writing these comments. My current take:

  - AI is revolutionizing how individuals work
  - It is not clear yet how AI can revolutionize how organizations work (even SWE)

adammarples · 2026-02-06T09:21:41 1770369701

If you had that article, would you read it fully before firing off questions?

lawgimenez · 2026-02-06T11:02:57 1770375777

Can't believe you don't know who the author is my man.

distances · 2026-02-07T11:11:32 1770462692

Different folks are interested in different niches. I don't know this author either. I would know many names from other subfields, though.

I once went to a meetup where the host introduced the speaker with "he needs no introduction". Well to this day I've no idea who the speaker was. Familiarity really shouldn't be assumed beyond a very, very small handful of people.

datsci_est_2015 · 2026-02-06T12:22:02 1770380522

Generally don’t pay attention to names unless it’s someone like Torvalds, Stroustrop, or Guido. Maybe this guy needs another decade of notoriety or something.

lawgimenez · 2026-02-07T01:56:59 1770429419

So, only 3 old dudes. Is that it? What's wrong with looking up to new and upcoming developers.

jedberg · 2026-02-06T19:31:33 1770406293

The author is the founder of Hasicorp. He created Vault and Terraform, among others.

datsci_est_2015 · 2026-02-06T22:37:05 1770417425

Curious, do you think his name should be as well known as Torvalds, Stroustrup, and Guido, who combined have ~120 years of serious contribution to the way that we write software, and continue to influence?

Because that’s the implication that I’m getting from downvotes + this reply.

Sure, Terraform is huge no doubt, but it’s no Linux, C++, or Python, yet. Correct me if I’m wrong, but I assume since they’re no longer involved with Hashicorp they’re no longer contributing to Terraform?

Quarrelsome · 2026-02-06T02:40:13 1770345613

we're talking about _this_ post? He specifically said he only runs one agent, so sure he probably reviews the code or as he stated finds means of auto-verifying what the agent does (giving the agent a way to self-verify as part of its loop).

eikenberry · 2026-02-06T20:45:21 1770410721

The primary point behind code reviews is to let author to know that someone else will look at their code. They are a psychological tool and that, AFAIK, don't work well with the AI models. If the code is important enough that you want to review it then you should probably be using a different, more interactive flow.

Mitchell talks about this in a round about way... in the "Reproduce your own work" section he obviously reviewed that code as that was the point. In the "End-of-day agents" section he talks about what he found them good for (so far). He previously wrote about how he preferred an interactive style and this article aligns with that with his progress understanding how code agents can be useful.

tptacek · 2026-02-06T02:15:11 1770344111

So read the code.

datsci_est_2015 · 2026-02-06T02:27:41 1770344861

Cool, code review continues to be one of the biggest bottlenecks in our org, with or without agentic AI pumping out 1k LOC per hour.

alexsmirnov · 2026-02-06T06:00:42 1770357642

For me, AI is the best for code research and review

Since some team members started using AI without care, I did create bunch of agents/skills/commands and custom scripts for claude code. For each PR, it collects changes by git log/diff, read PR data and spin bunch of specialized agents to check code style, architecture, security, performance, and bugs. Each agent armed with necessary requirement documents, including security compliance files. False positives are rare, but it still misses some problems. No PR with ai generated code passes it. If AI did not find any problems, I do manual review.

tptacek · 2026-02-06T02:30:19 1770345019

Ok? You still have to read the code.

nosianu · 2026-02-06T11:51:39 1770378699

That's just not what has been happening in large enterprise projects, internal or external, since long before AI.

Famous example - but by no means do I want to single out that company and product: https://news.ycombinator.com/item?id=18442941

From my own experience, I kept this post bookmarked because I too worked on that project in the late 1990s, you cannot review those changes anyway. It is handled as described, you keep tweaking stuff until the tests pass. There is fundamentally no way to understand the code. Maybe its different in some very core parts, but most of it is just far too messy. I tried merely disentangling a few types ones, because there were a lot of duplicate types for the most simple things, such as 32 bit integers, and it is like trying to pick one noodle out of a huge bowl of spaghetti, and everything is glued and knotted together, so you always end up lifting out the entire bowl's contents. No AI necessary, that is just how such projects like after many generations of temporary programmers (because all sane people will leave as soon as they can, e.g. once they switched from an H1B to a Green Card) under ticket-closing pressure.

I don't know why since the beginning of these discussions some commenters seem to work off wrong assumptions that thus far our actual methods lead to great code. Very often they don't, they lead to a huge mess over time that just gets bigger.

And that is not because people are stupid, its because top management has rationally determined that the best balance for overall profits does not require perfect code. If the project gets too messy to do much the customers will already have been hooked and can't change easily, and when they do, some new product will have already replaced the two decades old mature one. Those customers still on the old one will pay premium for future bug fixes, and the rest will jumpt to the new trend. I don't think AI can make what's described above any, or much worse.

tptacek · 2026-02-06T18:34:24 1770402864

If your team members hand off unreviewable blobs of code and you can't keep up, your problem is team management, not technology.

nosianu · 2026-02-06T19:36:14 1770406574

Yup, you didn't even read anything. Vibe commenting is worse than vibe coding.

bigstrat2003 · 2026-02-06T05:36:14 1770356174

You're missing the point. The point is that reading the code is more time consuming than writing it, and has always been thus. Having a machine that can generate code 100x faster, but which you have to read carefully to make sure it hasn't gone off the rails, is not an asset. It is a liability.

tptacek · 2026-02-06T05:37:48 1770356268

Tell that to Mitchell Hashimoto.

crazygringo · 2026-02-06T20:38:03 1770410283

> The point is that reading the code is more time consuming than writing it, and has always been thus.

Huh?

First, that is definitely not true. If it were, dev teams would spend the majority of their time on code review, but they don't.

And second, even if it were true, you have to read it for code review even if it was written by a person anyways, if we're talking about the context of a team.

IhateAI · 2026-02-06T02:43:17 1770345797

[flagged]

dotancohen · 2026-02-06T03:58:45 1770350325

So you have a hobby.

I have a profession. Therefore I evaluate new tools. Agents coding I've introduced into my auxiliary tool forgings (one-off bash scripts) and personal projects, and I'm just now comfortable to introduce into my professional work. But I still evaluate every line.

IhateAI · 2026-02-06T14:12:03 1770387123

"auxiliary tool forgings" You aren't a serious person.

dotancohen · 2026-02-06T14:18:12 1770387492

I may not be a serious person, but I am a serious professional.

raw_anon_1111 · 2026-02-06T03:54:06 1770350046

I love for companies to pay me money that I can in turn exchange for food, clothes and shelter.

BeetleB · 2026-02-06T17:36:32 1770399392

And not working with anyone else.

AI written code is often much easier to read/review than some of my coworkers'

alternatex · 2026-02-06T21:04:34 1770411874

Can't say the same for my colleagues' AI written code. It's overtly verbose and always does more that what's required.

acessoproibido · 2026-02-06T03:13:10 1770347590

So then type the code as well and read it after. Why are you mad

codyb · 2026-02-06T02:46:03 1770345963

I think this is the crux of why, when used as an enhancement to solo productivity, you'll have a pretty strict upper bound on productivity gains given that it takes experienced engineers to review code that goes out at scale.

That being said, software quality seems to be decreasing, or maybe it's just cause I use a lot of software in a somewhat locked down state with adblockers and the rest.

Although, that wouldn't explain just how badly they've murdered the once lovely iTunes (now Apple Music) user interface. (And why does CMD-C not pick up anything 15% of the time I use it lately...)

Anyways, digressions aside... the complexity in software development is generally in the organizational side. You have actual users, and then you have people who talk to those users and try to see what they like and don't like in order to distill that into product requirements which then have to be architected, and coordinated (both huge time sinks) across several teams.

Even if you cut out 100% of the development time, you'd still be left with 80% of the timeline.

Over time though... you'll probably see people doing what I do all day (which is move around among many repositories (although I've yet to use the AI much, got my Cursor license recently and am gonna spin up some POCs that I want to see soon)), enabled by their use of AI to quickly grasp what's happening in the repo, and the appropriate places to make changes.

Enabling developers to complete features from tip to tail across deep, many pronged service architectures would could bring project time down drastically and bring project management, and cross team coordination costs down tremendously.

Similarly, in big companies, the hand is often barely aware at best of the foot. And space exploration is a serious challenge. Often folk know exactly one step away, and rely on well established async communication channels which also only know one step further. Principal engineers seem to know large amounts about finite spaces and are often in the dark small hops away to things like the internal tooling for the systems they're maintaining (and often not particularly great at coming in to new spaces and thinking with the same perspective... no we don't need individual micro services for every 12 request a month admin api group we want to set up).

Once systems can take a feature proposal and lay out concrete plans which each little kingdom can give a thumbs up or thumbs down to for further modifications, you can again reduce exploration, coordination, and architecture time down.

Sadly, seems like User Experience design is an often terribly neglected part of our profession. I love the memes about an engineer building the perfect interface like a water pitcher only for the person to position it weirdly in order to get a pour out of the fill hole or something. Lemme guess how many users you actually talked to (often zero), and how many layers of distillation occurred before you received a micro picture feature request that ends up being build and taking input from engineers with no macro understanding of a user's actual needs, or day to day.

And who often are much more interested in perfecting some little algorithm thank thinking about enabling others.

So my money is on money flowing to... - People who can actually verify system integrity, and can fight fires and bugs (but a lot of bug fixing will eventually becoming prompting?) - Multi-talented individuals who can say... interact with users well enough to understand their needs as well as do a decent job verifying system architecture and security

It's outside of coding where I haven't seen much... I guess people use it to more quickly scaffold up expense reports, or generate mocks. So, lots of white collar stuff. But... it's not like the experience of shopping at the supermarket has changed, or going to the movies, or much of anything else.