There are not really any *newer* instruction sets as we are locked into the von ...

GregarianChild · 2025-06-07T09:30:29 1749288629

Modern GPU instructions are often VLIW and the compiler has to do a lot to schedule them. For example, Nvidia's Volta (from 2017) uses 128-bit to encode each instruction. According to [1], the 128 bits in a word are used as follows:

• at least 91 bits are used to encode the instruction

• at least 23 bits are used to encode control information associated to multiple instructions

• the remaining 14 bits appeared to be unused

AMD GPUs are similar, I believe. VLIW is good for instruction density. VLIW was unsuccessful in CPUs like Itanium because the compiler was expected to handle (unpredictable) memory access latency. This is not possible, even today, for largely sequential workloads. But GPUs typically run highly parallel workload (e.g. MatMul), and the dynamic scheduler can just 'swap out' threads that wait for memory loads. Your GPU will also perform terribly on highly sequential workloads.

[1] Z. Jia, M. Maggioni, B. Staiger, D. P. Scarpazza, Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking. https://arxiv.org/abs/1804.06826

inkyoto · 2025-06-08T10:19:22 1749377962

Personally, I have a soft spot for VLIW/EPIC architectures, and I really wish they were more successful in the mainstream computing.

I didn't consider GPU's precisely for the reason you mentioned – because of their unsuitability to run sequential workloads, which is most applications that end users run, even though nearly every modern computing contraption in existence has them today.

One, most assuredly, radical departure from the von Neumann architecture that I completely forgot about is the dataflow CPU architecture, which is vastly different from what we have been using in the last 60+ years. Even though there have been no productionised general purpose dataflow CPU's, it has been successfully implemented for niche applications, mostly in the networking. So, circling back to the original point raised, dataflow CPU instructions would certainly qualify for a new design.

GregarianChild · 2025-06-08T14:51:34 1749394294

The reason that VLIW/EPIC architectures have not been successful that for mainstream workloads is the combination of

• the "memory wall",

• the static unpredictability of memory access, and

• the lack of sufficient parallelism for masking latency.

Those make dynamically scheduling instructions is just much more efficient.

Dataflow has been tried many many many times for general-purposed workloads. And every time it failed for general-purposed workloads. In the early 2020s I was part of an expensive team doing a blank-slate dataflow architecture for a large semi company: the project got cancelled b/c the performance figures were weak relative to the complexity of micro-architecture, which was high (hence expensive verification and high area). As one of my colleagues on that team says: "Everybody wants to work on dataflow until he works on dataflow." Regarding history of dataflow architectures, [1] is from 1975, so half a century old this year.

[1] J. Dennis, A Preliminary Architecture for a Basic Data-Flow Processor https://courses.cs.washington.edu/courses/cse548/11au/Dennis...

scroot · 2025-06-08T02:10:54 1749348654

Have you heard about ufork? I find it very promising

https://github.com/organix/uFork

inkyoto · 2025-06-08T11:02:35 1749380555

Nope, not until now. It seems to be a much more modern take on the idea of an object oriented CPU architecture.

Yet, there is something about object oriented ISA's that has made CPU designers eschew them consistently. Ranging from the Intel iAPX-432, to the Japanese Smalltalk Katana CPU, to jHISC, to another, unrelated, Katana CPU by the University of Texas and the University of Illinois, none of them have ever yielded a mainstream OO CPU. Perhaps, modern computing is not very object oriented after all.

[0] https://github.com/organix/uFork/blob/main/docs/asm.md