More

bugglebeetle · 2025-10-20T13:20:08 1760966408

The granite Dockling models are unfortunately quite far below SOTA. dots-ocr and PaddleOCR were best here.

bugglebeetle · 2025-10-20T07:01:34 1760943694

Looks great, but looking at the benchmark, can’t help but think about how crazy good dots-ocr is as a model. Too bad they’re not as open as the Deepseek team because its so crazy good and would love to know how it was trained.

rfoo · 2025-10-20T07:09:22 1760944162

If you look you'd notice that it's the same Haoran Wei behind DeepSeek-OCR and GOT-OCR2.0 :p

bugglebeetle · 2025-10-20T17:13:03 1760980383

Oh you’re right! Good catch!

bethekind · 2025-10-20T07:13:39 1760944419

Did we read the same graph? DeepSeek Gundam 200 dpi appeared to get similar perf as dots-ocr, but with less tokens needed. The x axis is inverted, descending with distance from the origin.

bugglebeetle · 2025-10-06T17:28:04 1759771684

You should be more concerned that Chinese labs can train models that are just as good for 10X less because Americans treat the USD’s status as the global reserve currency as the ultimate bitter lesson. Who needs better math and engineering when you can print money to buy more GPUs???

goalieca · 2025-10-06T18:25:12 1759775112

The author literally said gpt is spending 10x more for equivalent. This really means ChatGPT had that intelligence at that cost a year or two ago. Smaller domain models are better at focused tasks in that area but can’t be generalized.

bugglebeetle · 2025-09-27T18:50:23 1758999023

Yeah, it’s hilarious to be having this conversation about MLEs while attributing the bad outcomes to anything other than poorly designed reward functions, i.e. management. If an engineer burned millions on failed training runs because they did a shit job of creating a policy that maximized for the desired outcome, they’d get canned, but that’s just a Tuesday for your average MBA with VC backing.

bugglebeetle · 2025-09-19T03:50:15 1758253815

Gemini just eclipsed ChatGPT to be #1 on the Apple app store for these kinds of apps. The 2.5 pro series is also good/SOTA at coding, but unfortunately poorly trained for the agentic workflows that have become predominant.

bugglebeetle · 2025-09-17T06:59:24 1758092364

A Scanner Darkly

bugglebeetle · 2025-09-16T16:33:46 1758040426

Listen not to what people say, but instead watch what they do.

bugglebeetle · 2025-09-05T14:05:27 1757081127

Ask ChatGPT to explain consequentialism to you.

bugglebeetle · 2025-09-01T00:31:24 1756686684

Write a blog post about this! Would love to read it.

bugglebeetle · 2025-08-31T23:08:14 1756681694

Do you provide this context or just ask the model to one-shot the problem?

Lerc · 2025-08-31T23:20:41 1756682441

A clear description of the problem, but one-shot.

Something along the lines of

Can you generate 8-bit AVR assembly code to multiply two 24 bit posit numbers

You get some pretty funny results from the models that have no idea what a posit is. It's usually pretty clear to tell if they know what they are supposed to be doing. I haven't had a success yet (haven't tried for a while though). Some of them have come pretty close, but usually it's the trying to squeeze more than 8 bits of data into a register is what brings them down.

bugglebeetle · 2025-09-01T00:22:51 1756686171

Yeah, so it’d be interesting to see if provided the correct context/your understanding of its error pattern, it can accomplish this.

One thing you learn quickly about working with LLMs if they have these kind of baked-in biases, some of which are very fixed and tied to their very limited ability to engage in novel reasoning (cc François Chollet), while others are far more loosely held/correctable. If it sticks with the errant patten, even when provided the proper context, it probably isn’t something an off-the-shelf model can handle.