Same. When I try to get it to do a simple loop (eg take screenshot, click next, ...

robots0only · 2025-08-26T19:58:54 1756238334

Claude is extremely poor at vision when compared to Gemini and ChatGPT. i think anthropic severely overfit their evals to coding/text etc. use cases. maybe naively adding browser use would work, but I am a bit skeptical.

bdangubic · 2025-08-26T20:07:45 1756238865

I have a completely different experience. Pasting a screenshot into CC is my de-facto go-to that more often than not leads to CC understanding what needs to be done etc…

user453 · 2025-08-26T21:05:06 1756242306

Is it overfitting if it makes them the best at those tasks?

CSMastermind · 2025-08-26T20:19:09 1756239549

This has been exactly my experience using all the browser based tools I've tried.

ChatGPT's agents get the furthest but even then they only make it like 10 iterations or something.

rzzzt · 2025-08-26T20:39:25 1756240765

I have better success with asking for a short script that does the million iterations than asking the thing to make the changes itself (edit: in IDEs, not in the browser).

seunosewa · 2025-08-27T11:17:15 1756293435

If you need precision, that's the way to go, and it's usually cheaper and faster too.

felarof · 2025-08-27T00:05:47 1756253147

I'm wondering if they are using vanilla claude or if they are using a fine-tuned version of claude specifically for browser use.

RL fine-tuning LLMs can have pretty amazing results. We did GRPO training of Qwen3:4B to do the task of a small action model at BrowserOS (https://www.browseros.com/) and it was much better than running vanilla Claude, GPT.

tripplyons · 2025-08-26T19:46:55 1756237615

Hopefully one of those "tricks" involves training a model on examples of browser use.