Passing tests doesn’t mean you have a working codebase.
Benchmarks that rely on a fixed test suite create a real optimization problem agents (or/and even humans) learn to satisfy the tests rather than preserve the deeper properties that make the system maintainable. AI write test cases which it thinks is easier for it to satisfy and not adhere-ing to business logic
We see this firsthand at Prismor with auto generated security fixes. Even with the best LLMs, validating fixes is the real bottleneck our pipeline struggles to exceed 70% on an internal golden dataset (which itself is somewhat biased).
Many patches technically fix the vulnerability but introduce semantic regressions or architectural drift. Passing tests is a weak signal and proving a fix is truly safe to merge is much harder
We recently ran a deep security audit using Prismor, scanning some of the most popular AI agent frameworks end to end. It included full Software Composition Analysis, SBOM reviews, and vulnerability mapping across thousands of packages and transitive dependencies. Here's what we found.
Not dystopian — just practical for certain use cases. Humans still build and control everything. This is more about enabling efficient machine-to-machine interaction where needed.
I think because major manufacturing moved to Asia which drastically cuts labor and production costs. Almost 99% of the tvs are flat and require same uniform manufacturing
I remember back in 2018 we used do FFmpeg split clips into frames, hit each with GoogLeNet gradient ascent on layers thenn blended prev frame for crude smoothing
SOTA for frame interpretation today is probably RIFE (https://github.com/hzwer/ECCV2022-RIFE) as far as I know, which is fast as hell as well, and really good results still. But it's already 4 years old now, anyone know if there is anything better than RIFE for this sort of stuff today?
The brutal part is how rotate secrets and move on has become the default hygiene advice when the real pattern is that npm keeps being the soft underbelly of modern stacks
It should be mandatory for a build process to have some tool like Prismor scan for these
Encoder: learns which stimulation patterns tend to improve reward
Biological neurons: adapt to the stimulation and generate spike responses that reinforce certain patterns
Decoder: interprets those spike patterns and converts them into joystick movements
right?
reply