> Eg how do you build representative evals and measure forward progress?
This assumes that those companies do evaluations. In my experience, seeing a huge amount of internal AI projects at my company (FAANG), there's not even 5% that have any sort of eval in place.
Yeah, I believe that lots of startups don’t have evals either, but as soon as you get paying customers you’re gonna need something to prevent accidentally regressing as you tune your scaffolding, swap in newer models, etc.
This is a big chasm that I could well believe a lot of founders fail to cross.
It’s really easy to build an impressive-looking tech demo, much harder to get and retain paying customers and continuously improve.
But! Plenty of companies are actually doing this hard work.
The money is easy to come by because wealthy investors, while they don't want to pay any more in taxes, are desperate to find possible returns in an economy that sucks outside of ballooning healthcare and the AI bubble... not because they need the money but because NUMBER MUST GO UP.
And more so than even most VC markets, raising for an "AI" company is more about who you know than what results you can show.
If anyone is actually showing significant results, where's the actual output of the AI-driven software boom (beyond just LLMs making coders more efficient by being a better google)? I don't see any real signs of it. All I see is people doing after market modifications on the shovels, I've yet to see any of the end users of these shovels coming down from the hills with sacks of real gold.
I'm with you. I don't think anyone appreciates the effort that goes into a good measurable, repeatable eval / improvement process unless they've been through it in anger themselves.
This dismisses a lot of actual hard work. The scaffolding required to get SOTA performance is non-trivial!
Eg how do you build representative evals and measure forward progress?
Also, tool calling, caching, etc is beyond what folks normally call “prompt engineering”.
If you think it’s trivial though - go build a startup and raise a seed round, the money is easy to come by if you can show results.