Why should he put effort into measuring a tool that the author has not? The point is there are so many of these tools an objective measure that the creators of these tools can compare against each other would be better.
So a better question to ask is - Do you have any ideas for an objective way to a measure a performance of agentic coding tools? So we can truly determine what improves performance or not.
I would hope that internal to OpenAI and Anthropic they use something similar to the harness/test cases they use for training their full models to determine if changes to claude code result in better performance.
Well, if I were Microsoft and training co-pilot, I would log all the <restore checkpoint> user actions and grade the agents on that. At scale across all users, "resets per agent command" should be useful. But then again, publishing the true numbers might be embarrassing..
I think the implicit take is that if your company hits AGI your equity package will do something like 10x-100x even if the company is already big. The only other way to do that is join a startup early enough to ride its growth wave.
Another way to say it is that people think it’s much more likely for each decent LLM startup grow really strongly first several years then plateau vs. then for their current established player to hit hyper growth because of AGI.
A catch here is that individual workers may have priorities which are altered due to the strong natural preference for assuring financial independence. Even if you were a hot AI researcher who felt (and this is just a hypothetical) that your company was the clear industry leader and had, say, a 75% chance of soon achieving something AGI-adjacent and enabling massive productivity gains, you might still (and quite reasonably) prefer to leave if that was what it took to make absolutely sure of getting of your private-income screw-you money (and/or private-investor seed capital). Again this is just a hypothetical: I have no special insight, and FWIW my gut instinct is that the job-hoppers are in fact mostly quite cynical about the near-term prospects for "AGI".
Additionally, if you've already got vested stock in Company A from your time working there, jumping ship to Company B (with higher pay and a stock package) is actually a diversification. You can win whichever ship pulls in first.
The 'no one jumps ship if agi is close' assumption is really weak, and seemingly completely unsupported in TFA...
You're right, but the narrative out of these companies directly refutes this position. They're explicitly saying that 1. AGI changes everything, 2. It's just around the corner, 3. They're completely dedicated to achieving it; nothing is more important.
Don't conflate labor's perspective with capital's started position... The companies aren't leaving the companies, the workers are leaving the companies.
Have you tried a smart watch? The Duo 2FA app lets you add an arbitrary TFA code based authenticator with same QR code Google Authenticator supports and generate those from their Apple WatchOS [0] or Android WearOS apps. I have used it successfully for years, it's a huge reason I got an Apple Watch in fact. Now you'll have to configure your watch with a "work" focus mode that turns off all notifications and not install any fancy apps on the watch (do those still exist?), but it can free you from your phone.
Along the same lines the Meta Wayfarer[2] smart glasses lets you take slice of life photos and videos without needing to whip out your phone. You lose a ton of quality but stay in the moment more. The AI features are getting better so eventually you'll be able to use it for basic information lookup.
What I find most frustrating are the bills written as prose-diffs themselves: "In some entirely different piece of law, Foo shall be inserted after Bar, with an overall effect and purpose which will not be described here."
One of my friends is a public health lobbyist and she is used to having to explain to stakeholders that THIS formatting mark in the PDF means they're adding text but THAT formatting mark means they're deleting text. It's not immediately obvious and every state has its own way of presenting information. I'd argue that DC does it best, but I haven't looked at every legislature.
reply