For anyone else who was initially confused by this, useful context is that Snowb...

govping · 2025-12-07T11:30:39 1765107039

We've been using LLMs for security research (finding vulnerabilities in ML frameworks) and the pattern is similar - it's surprisingly good at the systematic parts (pattern recognition, code flow analysis) when you give it specific constraints and clear success criteria.

The interesting part: the model consistently underestimates its own speed. We built a complete bug bounty submission pipeline - target research, vulnerability scanning, POC development - in hours when it estimated days. The '10 attempts' heuristic resonates - there's definitely a point where iteration stops being productive.

For decompilation specifically, the 1M context window helps enormously. We can feed entire codebases and ask 'trace this user input to potential sinks' which would be tedious manually. Not perfect, but genuinely useful when combined with human validation.

The key seems to be: narrow scope + clear validation criteria + iterative refinement. Same as this decompilation work.

tails4e · 2025-12-07T09:54:20 1765101260

Why not follow decompilation like ghidra does, rather than guess, compile, compare? It seems more sensible to actually decompile.

jfjfnfjrh · 2025-12-07T11:14:36 1765106076

Because decompilation does has functions and variables that are nonhuman parsable ... I.e. func_1223337377 with variables a b c d

Animats · 2025-12-07T08:32:52 1765096372

In decompilation "matching" means you found a function block in the machine code, wrote some C, then confirmed that the C produces the exact same binary machine code once it is compiled.

They had access to the same C compiler used by Nintendo in 1999? And the register allocation on a MIPS CPU is repeatable enough to get an exact match? That's impressive.

ACCount37 · 2025-12-07T09:27:35 1765099655

Broadly, yes.

The groundwork for this kind of "matching" process is: sourcing odd versions of the obscure tooling that was used to build the target software 20 years ago, and playing with the flag combinations to find out which was used.

It helps that compilers back then were far less complex than those of today, and so was the code itself. But it's still not a perfect process.

There are cases of "flaky" code - for example, code that depends on the code around it. So you change one function, and that causes 5 other functions to no longer match, and 2 functions to go from not matching to matching instead.

Figuring out and resolving those strange dependencies is not at all trivial, so a lot of decompliation efforts end up wrapping it up at some "100% functional, 99%+ matching".

simonw · 2025-12-07T14:05:23 1765116323

There's a note about that:

> Snowboard Kids 2 was written in C and compiled to MIPS machine code. The compiler was likely GCC 2.7.2 based on the instruction patterns [3]

The footnote is interesting: https://blog.chrislewis.au/using-coding-agents-to-decompile-...

> This is mostly just guesswork and trying different variations of compiler versions and configuration options. But it isn’t as bad as it sounds since the time period limits which compilers were plausibly used. The compiler arguments used in other, similar, games also provide a useful reference.

your_sweetpea · 2025-12-07T00:37:20 1765067840

I'd like to see this given a bit more structure, honestly. What occurs to me is constraining the grammar for LLM inference to ensure valid C89 (or close-to, as much can be checked without compilation), then perhaps experimentally switching to a permuter once/if a certain threshold is reached for accuracy of the decompiled function.

Eventually some or many of these attempts would, of course, fail, and require programmer intervention, but I suspect we might be surprised how far it could go.

ACCount37 · 2025-12-07T09:23:58 1765099438

I don't expect constraining the grammar to do all that much for modern LLMs - they're pretty good at constraining themselves. Having it absorb the 1% of failures that's caused by grammar issues is not worth the engineering effort.

The modern approach is: feed the errors back to the LLM and have it fix them.

elitan · 2025-12-06T17:54:03 1765043643

helpful