That is a good approach, bottom up, manage complexity. But the general picture is - you set the direction and hold the model responsible, it does the actual work. Think of it as your work is the negative of the AI work, it writes the code, you ensure it tests that code. The better test harness you create, the better the AI works. The real task is to constrain the AI into a narrow channel of valid work.
In teams of high performers who have built a lot of mutual trust, code reviews are mostly a formality and a stop gap against the big, obvious accidental blunders. "LGTM!"
I do not know or trust the agents that are putting out all this code, and the code review process is very different.
Watching the Copilot code review plugin complain about Agent code on top of it all has been quite an experience.
Didn’t we just see big pretraining gains from Google and likely Anthropic?
I like Dario’s view on this, we’ve seen this story before with deep learning. Then we progressively got better regularization, initialization, and activations.
I’m sure this will follow the same suit, the graph of improvement is still linear up and to the right
Take it with a grain of salt, this is one man’s opinion, even though he is a very smart man.
People have been screaming about an AI winter since 2010 and it never happened, it certainly won’t happen now that we are close to AGI which is a necessity for national defense.
I prefer Dario’s perspective here, which is that we’ve seen this story before in deep learning. We hit walls and then found ways around them with better activation functions, regularization and initialization.
This stuff is always a progression in which we hit roadblocks and find ways around them. The chart of improvement is still linearly up and to the right. Those gains are the cumulation of small improvements adding up.
reply