For anyone else who was initially confused by this, useful context is that Snowboard Kids 2 is an N64 game.
I also wasn't familiar with this terminology:
> You hand it a function; it tries to match it, and you move on.
In decompilation "matching" means you found a function block in the machine code, wrote some C, then confirmed that the C produces the exact same binary machine code once it is compiled.
We've been using LLMs for security research (finding vulnerabilities in ML frameworks) and the pattern is similar - it's surprisingly good at the systematic parts (pattern recognition, code flow analysis) when you give it specific constraints and clear success criteria.
The interesting part: the model consistently underestimates its own speed. We built a complete bug bounty submission pipeline - target research, vulnerability scanning, POC development - in hours when it estimated days. The '10 attempts' heuristic resonates - there's definitely a point where iteration stops being productive.
For decompilation specifically, the 1M context window helps enormously. We can feed entire codebases and ask 'trace this user input to potential sinks' which would be tedious manually. Not perfect, but genuinely useful when combined with human validation.
The key seems to be: narrow scope + clear validation criteria + iterative refinement. Same as this decompilation work.
In decompilation "matching" means you found a function block in the machine code, wrote some C, then confirmed that the C produces the exact same binary machine code once it is compiled.
They had access to the same C compiler used by Nintendo in 1999? And the register allocation on a MIPS CPU is repeatable enough to get an exact match? That's impressive.
The groundwork for this kind of "matching" process is: sourcing odd versions of the obscure tooling that was used to build the target software 20 years ago, and playing with the flag combinations to find out which was used.
It helps that compilers back then were far less complex than those of today, and so was the code itself. But it's still not a perfect process.
There are cases of "flaky" code - for example, code that depends on the code around it. So you change one function, and that causes 5 other functions to no longer match, and 2 functions to go from not matching to matching instead.
Figuring out and resolving those strange dependencies is not at all trivial, so a lot of decompliation efforts end up wrapping it up at some "100% functional, 99%+ matching".
> This is mostly just guesswork and trying different variations of compiler versions and configuration options. But it isn’t as bad as it sounds since the time period limits which compilers were plausibly used. The compiler arguments used in other, similar, games also provide a useful reference.
I'd like to see this given a bit more structure, honestly. What occurs to me is constraining the grammar for LLM inference to ensure valid C89 (or close-to, as much can be checked without compilation), then perhaps experimentally switching to a permuter once/if a certain threshold is reached for accuracy of the decompiled function.
Eventually some or many of these attempts would, of course, fail, and require programmer intervention, but I suspect we might be surprised how far it could go.
I don't expect constraining the grammar to do all that much for modern LLMs - they're pretty good at constraining themselves. Having it absorb the 1% of failures that's caused by grammar issues is not worth the engineering effort.
The modern approach is: feed the errors back to the LLM and have it fix them.
I also wasn't familiar with this terminology:
> You hand it a function; it tries to match it, and you move on.
In decompilation "matching" means you found a function block in the machine code, wrote some C, then confirmed that the C produces the exact same binary machine code once it is compiled.
The author's previous post explains this all in a bunch more detail: https://blog.chrislewis.au/using-coding-agents-to-decompile-...