> write a failing test for a single behavior, Outside of work I've been running ...

> write a failing test for a single behavior,

Outside of work I've been running a pure vibe-coding experiment where I don't look at the code at all, ever. I'm using this approach of telling it a specific scenario has to work in a certain way (the software relates to financial and tax planning).

The AI bot is very creative at creating a mess even with such tight guardrails. Many days into it I discovered that it had implemented four completely separate tax computation routines. All of them buggy in different ways. All of them addressed specific scenarios I had specified as part of the spec. But it never occurred to the bot to have a single centralized tax function! It is very good at satisfying specific scenarios I give, but absolutely terrible at any kind of system-wide planning.

(I'm using cursor for this experiment)