Pulling ahead? Depends on the usecase I guess. 3 turns into a very basic Gemini-CLI session and Gemini 3 Pro has already messed up a simple `Edit` tool-call.
And it's awfully slow. In 27 minutes it did 17 tool calls, and only managed to modify 2 files. Meanwhile Claude-Code flies through the same task in 5 minutes.
Yeah - agree, Anthropic much better for coding. I'm more thinking about the 'average chat user' (the larger potential userbase), most of whom are on chatgpt.
Knowing Googles MO, its most likely not the model but their harness system that's the issue. God they are so bad at their UI and agentic coding harnesses...
It has always been like this. It's a super simple system. All artists are only identified by their name. So there are a ton of artist pages out there that actually have to represent multiple artists with the same name. It's kinda silly, but oh well.
It's kind of funny how not a lot of people realize this.
On one hand this is a feature: you're able to "multishot prompt" an LLM into providing the wanted response. Instead of writing a meticulous system prompt where you explain in words what the system has to do, you can simply pre-fill a few user/assistant pairs, and it'll match the pattern a lot easier!
I always thought Gemini Pro was very good at this. When I wanted a model to "do by example", I mostly used Gemini Pro.
And that is ALSO Gemini's weakness! Because as soon as something goes wrong in Gemini-CLI, it'll repeat the same mistake over and over again.
> despite things being, well, entirely underwhelming
People use LLMs for all kinds of things, but for coding it is absolutely not underwhelming. Can you treat it as a real independent developer, one that doesn't need supervision? No. Can it save you hours and hours of work? Yes.
I switched to Ly a few weeks ago, after SDDM kept crashing. And I switched to SDDM because GDM did the same thing earlier. They both were caught in some kind of crash-loop that it was impossible for me to switch to another virtual console.