I spent some time last weekend on a small side project which involves JIT encoding ARM64 instructions to run them on Apple Silicon.
I’ve written assembly before, but encoding was always kind of black magic.
How surprised was I to learn how simple instruction encoding is on arm64! Arguably simpler than implementing encoding wasm to byte code, which I played with a while ago.
If you want to play with this, based on my very very limited experience so far, I’d suggest starting with arm - fixed length 4 byte instructions, nice register naming scheme, straightforward encoding of arguments, make it very friendly.
I agree: AArch64 is a nice instruction set to learn. (Source: I taught ARMv7, AArch64, x86-64 to first-year students in the past.)
> how simple instruction encoding is on arm64
Having written encoders, decoders, and compilers for AArch64 and x86-64, I disagree. While AArch64 is, in my opinion, very well designed (also better than RISC-V), it's certainly not simple. Here's some of my favorite complexities:
- Many instructions have (sometimes very) different encodings. While x86 has a more complex encoding structure, most encodings follow the same structure and are therefore remarkably similar.
- Huge amount of instruction operand types: memory + register, memory + unsigned scaled offset, memory + signed offset, optionally with pre/post-increment, but every instruction supports a different subset; vector, vector element, vector table, vector table element; sometimes general-purpose register encodes a stack pointer, sometimes a zero register; various immediate encodings; ...
- Logical immediate encoding. Clever, but also very complex. (To be sure that I implemented the decoding correctly, I brute-force test all inputs...)
- Register constraints: MUL (by element) with 16-bit integers has a register constraint on the lowest 16 registers. CASP requires an even-numbered register. LD64B requires an even-numbered register less than 24 (it writes Xt..Xt+7).
- Much more instructions: AArch64 SIMD (even excluding SVE) has more instructions than x86 including up to AVX-512. SVE/SME takes this to another level.
Actually, nowadays Arm describes the ISA as a load-store architecture. The RISC vs. CISC debate is, in my opinion, pretty pointless nowadays and I'd prefer if we'd just stop using these words to describe ISAs.
I spent some time last weekend on a small side project which involves JIT encoding ARM64 instructions to run them on Apple Silicon.
I’ve written assembly before, but encoding was always kind of black magic.
How surprised was I to learn how simple instruction encoding is on arm64! Arguably simpler than implementing encoding wasm to byte code, which I played with a while ago.
If you want to play with this, based on my very very limited experience so far, I’d suggest starting with arm - fixed length 4 byte instructions, nice register naming scheme, straightforward encoding of arguments, make it very friendly.