More

renehsz · 2026-01-27T05:12:12 1769490732

> maybe today you can still build a win10 binary with a win11 toolchain, but you cannot build a win98 binary with it for sure.

In my experience, that's not quite accurate. I'm working on a GUI program that targets Windows NT 4.0, built using a Win11 toolchain. With a few tweaks here and there, it works flawlessly. Microsoft goes to great lengths to keep system DLLs and the CRT forward- and backward-compatible. It's even possible to get libc++ working: https://building.enlyze.com/posts/targeting-25-years-of-wind...

AshamedCaptain · 2026-01-27T20:53:03 1769547183

What does "a Win11 toolchain" mean here? In the article you link, the guy is filling missing functions, rewriting the runtime, and overall doing even more work than what I need to do to build binaries on a Linux system from 2026 that would work on a Linux from the 90s : a simple chroot. Even building gcc is a walk in the park compared to reimplementing OS threading functions...

renehsz · 2025-12-17T04:42:11 1765946531

Strongly agree with this article. It highlights really well why overcommit is so harmful.

Memory overcommit means that once you run out of physical memory, the OOM killer will forcefully terminate your processes with no way to handle the error. This is fundamentally incompatible with the goal of writing robust and stable software which should handle out-of-memory situations gracefully.

But it feels like a lost cause these days...

So much software breaks once you turn off overcommit, even in situations where you're nowhere close to running out of physical memory.

What's not helping the situation is the fact that the kernel has no good page allocation API that differentiates between reserving and committing memory. Large virtual memory buffers that aren't fully committed can be very useful in certain situations. But it should be something a program has to ask for, not the default behavior.

charcircuit · 2025-12-20T04:02:14 1766203334

>terminate your processes with no way to handle the error. This is fundamentally incompatible with the goal of writing robust and stable software

Having an assumption that your process will never crash is not safe. There will always be freak things like CPUs taking the wrong branch or bits randomly flipping. Parting of design a robust system is being tolerant to things like this.

Another point also mentioned is this thread is that by the time you run out of memory the system already is going to be in a bad state and now you probably don't have enough memory to even get out of it. Memory should have been freed already by telling programs to lighten up on their memory usage or by killing them and reclaiming the resources.

PunchyHamster · 2025-12-20T02:51:14 1766199074

It's not harmful. It's necessary for modern systems that are not "an ECU in a car"

> Memory overcommit means that once you run out of physical memory, the OOM killer will forcefully terminate your processes with no way to handle the error. This is fundamentally incompatible with the goal of writing robust and stable software which should handle out-of-memory situations gracefully.

The big software is not written that way. In fact, writing software that way means you will have to sacrifice performance, memory usage, or both because you either * need to allocate exactly what you always need and free it when it gets smaller (if you want to keep memory footprint similar)m and that will add latency * over-allocate, and waste RAM

And you'd end up with MORE memory related issues, not less. Writing app where every allocation can fail is just nightmarish waste of time for 99% of the apps that are not "onboard computer of a space ship/plane"

201984 · 2025-12-19T20:50:10 1766177410

> What's not helping the situation is the fact that the kernel has no good page allocation API that differentiates between reserving and committing memory.

mmap with PROT_NONE is such a reservation and doesn't count towards the commit limit. A later mmap with MAP_FIXED and PROT_READ | PROT_WRITE can commit parts of the reserved region, and mmap calls with PROT_NONE and MAP_FIXED will decommit.

hparadiz · 2025-12-19T20:36:45 1766176605

That's a normal failure state that happens occasionally. Out of memory errors come up all the time when writing robust async job queues. There are a lot of other reasons a failure could happen but running out of memory is just one of them. Sure I can force the system to use swap but that would degrade performance for everything else so it's better to let it die and log the result and check your dead letter queue after.

barchar · 2025-12-20T02:49:47 1766198987

Even besides the aforementioned fork problems not having overcommit doesn't mean you can handle oom correctly by just handling errors from malloc!

renehsz · 2025-12-15T04:52:52 1765774372

There's still plenty of mandatory reading. It's not unusual for high schoolers to have to read at least two books per semester. Here's the problem though: It's just too easy to... you know... not do it. Teachers have no way of reliably telling the difference between those students who complete their reading assignments honestly and those who make due with summaries and AI assistance. Don't ask me how I know ;-)

renehsz · 2025-10-12T11:44:23 1760269463

The Plan 9 operating system.

It's the closest thing to a Unix successor we ever got, taking the "everything is a file" philosophy to another level and allowing to easily share those files over the network to build distributed systems. Accessing any remote resources is easy and robust on Plan9, meanwhile on other systems we need to install specialized software with bad interoperability for each individual use case.

Plan9 also had some innovative UI features, such as mouse chording to edit text, nested window managers, the Plumber to run user-configurable commands on known text patterns system-wide, etc.

Its distributed nature should have meant it's perfect for today's world with mobile, desktop, cloud, and IoT devices all connected to each other. Instead, we're stuck with operating systems that were never designed for that.

There are still active forks of Plan9 such as 9front, but the original from Bell Labs is dead. The reasons it died are likely:

- Legal challenges (Plan9 license, pointless lawsuits, etc.) meant it wssn't adopted by major players in the industry.

- Plan9 was a distributed OS during a time when having a local computer became popular and affordable, while using a terminal to access a centrally managed computer fell out of fashion (though the latter sort of came back in a worse fashion with cloud computing).

- Bad marketing and posing itself as merely a research OS meant they couldn't capitalize on the .com boom.

- AT&T lost its near endless source of telephone revenue. Bell Labs was sold multiple times over the coming years, a lot of the Unix/Plan9 guys went to other companies like Google.

teddyh · 2025-10-12T14:33:52 1760279632

> The reasons it died are likely:

The reason Plan 9 died a swift death was that, unlike Unix – which hardware manufacturers could license for a song and adapt to their own hardware (and be guaranteed compatibility with lots of Unix software) – Bell Labs tried to sell Plan 9, as commercial software, for $350 a box.

(As I have written many times in the past: <https://news.ycombinator.com/item?id=22412539>, <https://news.ycombinator.com/item?id=33937087>, and <https://news.ycombinator.com/item?id=43641480>)

EdiX · 2025-10-12T14:56:06 1760280966

Version 1 was never licensed to anyone. Version 2 was only licensed to universities for an undiscolsed price. Version 3 was sold as a book, I think this is the version you are referring to. However note that this version contained a license that only allowed non commercial uses of the source code. It also came with no support, no community and no planned updates (the project was shelved half a year later in favor of inferno)

More than the price tag the problem is that plan 9 wasn't really released until 2004.

Shugyousha · 2025-10-12T18:46:50 1760294810

Strictly speaking, it's not dead. The code is now open source and all the rights are with the Plan 9 foundation: https://p9f.org/

It's just unlikely that it will get as big of a following as Linux has.

pjmlp · 2025-10-13T07:39:06 1760341146

Had UNIX also been something like other OSes price points, instead of a song as you say, it would never even taken off, it was more about the openess and being crazy cheap than the alternatives, than anything else.

pjmlp · 2025-10-13T07:36:23 1760340983

The team moved on to work on Inferno, which Plan 9 afficionados tend to forget about, which was also a much better idea as UNIX evolution, Plan 9 combined with a managed userspace, which also didn't went down well.

mycall · 2025-10-12T14:31:33 1760279493

Plan 9 Filesystem Protocol lives on inside WSL2.

ajross · 2025-10-12T16:16:38 1760285798

9P is used everywhere in the VM ecosystem. It's clean and simple and well supported by almost all guests.

cratermoon · 2025-10-12T15:45:38 1760283938

And kubernetes

hbogert · 2025-10-14T08:09:04 1760429344

you mean literally with code, or in spirit?

tjchear · 2025-10-12T14:08:24 1760278104

What’s stopping other Unix-like systems from adopting the everything is a file philosophy?

IshKebab · 2025-10-12T17:22:16 1760289736

Probably the fact that it's a pretty terrible idea. It means you take a normal properly typed API and smush it down into some poorly specified text format that you now have to write probably-broken parsers for. I often find bugs in programs that interact with `/proc` on Linux because they don't expect some output (e.g. spaces in paths, or optional entries).

The only reasons people think it's a good idea in the first place is a) every programming language can read files so it sort of gives you an API that works with any language (but a really bad one), and b) it's easy to poke around in from the command line.

Essentially it's a hacky cop-out for a proper language-neutral API system. In fairness it's not like Linux actually came up with a better alternative. I think the closest is probably DBus which isn't exactly the same.

Maybe something like FIDL is a proper solution but I have only read a little about it: https://fuchsia.dev/fuchsia-src/get-started/learn/fidl/fidl

vacuity · 2025-10-12T20:38:21 1760301501

I think you have to standardize a basic object system and then allow people to build opt-in interfaces on top, because any single-level abstraction will quickly be pulled in countless directions for as many users.

immibis · 2025-10-13T05:00:14 1760331614

Like OLE Automation!

c0balt · 2025-10-12T14:16:26 1760278586

Probably that not everything can be cleanly abstracted as a file.

One might want to, e. G., have fine control over a how a network connection is handled. You can abstract that as a file but it becomes increasingly complicated and can make API design painful.

Someone · 2025-10-12T16:08:41 1760285321

> Probably that not everything can be cleanly abstracted as a file.

I would say almost nothing can be cleanly abstracted as a file. That’s why we got ioctl (https://en.wikipedia.org/wiki/Ioctl), which is a bad API (calls mean “do something with this file descriptor” with only conventions introducing some consistency)

naasking · 2025-10-12T19:51:24 1760298684

Everything can be abstracted as a file, it just may not be most efficient interface.

vacuity · 2025-10-12T20:36:28 1760301388

If everything can be represented as a Foo or as a Bar, then this actually clears up the discussion, allowing the relative merits of each representation to be discussed. If something is a universal paradigm, all the better to compare it to alternatives, because one will likely be settled on (and then mottled with hacks over time; organic abstraction sprawl FTW).

mike_hearn · 2025-10-12T20:02:22 1760299342

The fact that everything is not a file. No OS actually implements that idea including Plan9. For example, directories are not files. Plan9 re-uses a few of the APIs for them, but you can't use write() on a directory, you can only read them.

Pretending everything is a file was never a good idea and is based on an untrue understanding of computing. The everything-is-an-object phase the industry went through was much closer to reality.

Consider how you represent a GUI window as a file. A file is just a flat byte array at heart, so:

1. What's the data format inside the file? Is it a raw bitmap? Series of rendering instructions? How do you communicate that to the window server, or vice-versa? What about ancillary data like window border styles?

2. Is the file a real file on a real filesystem, or is it an entry in a virtual file system? If the latter then you often lose a lot of the basic features that makes "everything is a file" attractive, like the ability to move files around or arrange them in a user controlled directory hierarchy. VFS like procfs are pretty limited. You can't even add your own entries like adding symlinks to procfs directories.

3. How do you receive callbacks about your window? At this point you start to conclude that you can't use one file to represent a useful object like a window, you'd need at least a data and a control file where the latter is some sort of socket speaking some sort of RPC protocol. But now you have an atomicity problem.

4. What exactly is the benefit again? You won't be able to use the shell to do much with these window files.

And so on. For this reason Plan9's GUI API looked similar to that of any other OS: a C library that wrapped the underlying file "protocol". Developers didn't interact with the system using the file metaphor, because it didn't deliver value.

All the post-UNIX operating system designs ignored this idea because it was just a bad one. Microsoft invested heavily in COM and NeXT invested in the idea of typed, IDL-defined Mach ports.

pjmlp · 2025-10-13T07:47:57 1760341677

Unfortunely Microsoft didn't invest heavily enough on COM tooling, it sucks in 2025 as much as in the 1990's.

mike_hearn · 2025-10-13T08:31:42 1760344302

Sure, why would they? COM was rendered irrelevant by the move to the web. Microsoft lost out on the app serving side, and when they dropped the ball on ActiveX by not having proper UI design or sandboxing they lost out on the client too. Probably the primary use case outside of legacy OPC is IT departments writing PowerShell scripts or Office plugins (though those are JS based now too).

COM has been legacy tech for decades now. Even Microsoft's own security teams publish blog posts enthusiastically explaining how they found this strange ancient tech from some Windows archaeological dig site, lol. Maybe one day I'll be able to mint money by doing maintenance consulting for some old DCOM based systems, the sort of thing where knowing what an OXID resolver is can help and AI can't do it well because there's not enough example code on GitHub.

pjmlp · 2025-10-13T08:34:36 1760344476

Because since Windows Vista all new APIs are COM based, Win32 C API is basically stuck on Windows XP view of the universe, with minor exceptions here and there.

Anyone that has to deal with Windows programming quickly discovers that COM is not the legacy people talk about on the Internet.

mike_hearn · 2025-10-13T11:51:57 1760356317

Sure I mean, obviously the Windows API is COM based and has been for a long time. My point is, why seriously invest in the Windows API at all? A lot of APIs are only really being used by the Chrome team at this point anyway, so the quality of the API hardly matters.

pjmlp · 2025-10-13T12:10:17 1760357417

Game development for one, and there are still plenty of native applications on Windows to chose from, like most stuff in graphics, video editing, DAW, life sciences and control automation, thankfully we don't need Chrome in box for everything.

Your remark kind of proves the point Web is now ChromeOS Platform, as you could have mentioned browser instead.

WD-42 · 2025-10-12T14:24:01 1760279041

They have to an extent. The /proc file system on Linux is directly inspired by plan 9 IIRC. Other things like network sockets never got that far and are more related to their BSD kin.

cratermoon · 2025-10-12T15:48:27 1760284107

There's also /dev/tcp in Linux

    exec 5<>/dev/tcp/www.google.com/80
    echo -e "GET / HTTP/1.1\r\nhost: www.google.com\r\nConnection: close\r\n\r\n" >&5
    cat <&5

Zambyte · 2025-10-12T16:50:51 1760287851

/dev/tcp does not exist in Linux.

    ls /dev/tcp

It is an abstraction in GNU Bash.

cratermoon · 2025-10-12T20:36:00 1760301360

Pardon, you're correct.

WD-42 · 2025-10-12T20:49:39 1760302179

It’s still really cool. I had no idea that existed

pjmlp · 2025-10-13T07:41:17 1760341277

Not at all, /proc comes from Solaris.

WD-42 · 2025-10-14T00:08:38 1760400518

Looking into it we are both wrong. Plan 9 implemented /proc after 8th edition Unix. Solaris and Linux both implemented it at the same time in 1992.

PeterisP · 2025-10-20T10:52:47 1760957567

Abstractions are inherently a tradeoff, and too much abstraction hurts you when the assumptions break.

For a major example, treating a network resource like a file is neat and elegant and simple while the network works well, however, once you have unreliable or slow or intermittent connectivity, the abstraction breaks and you have to handle the fact that it's not really like a local file, and your elegant abstraction has to be mangled with all kinds of things so that your apps are able to do that.

IshKebab · 2025-10-12T17:13:33 1760289213

And they fixed symlinks.

AfterHIA · 2025-10-13T00:01:16 1760313676

UNIX is for dorks. We needed a Smalltalk style, "everything is an object and you can talk to all objects" but thankfully we got Java and, "object oriented" C++. The Alto operating system was leaps and bounds ahead of the Mac and Windows 3.1 system and it took Steve Jobs a decade to realize, "oh shit we could have just made everything an object." Then we get WebObjects and the lousy IPod and everything is fascist history.

#next #never #forget #thieves

pjmlp · 2025-10-13T07:46:25 1760341585

I had a UNIX zealot phase back in the 1990's, until the university library opened my eyes to Xerox PARC world, tucked away at the back there were all the manuals and books about Smalltalk from Xerox, eventually I also did some assigments with Smalltalk/V, and found a way to learn about Interlisp and Mesa/Cedar as well.

My graduation project was porting a visualisation framework from Objective-C/NeXTSTEP to Windows.

At the time, my X setup was a mix of AfterStep or windowmaker, depending on the system I was at.

mike_hearn · 2025-10-13T08:37:49 1760344669

OK, go use Windows then, it's the dream architecture you always wanted ;)

https://learn.microsoft.com/en-us/windows/win32/com/com-tech...

AfterHIA · 2025-10-14T14:58:39 1760453919

Dichotomizing the world as either UNIX based or Windows is pretty myopic. I want the computer architecture Douglas Engelbart dreamed of. I want a realization of the ideas of Seymour Papert and Brenda Laurel.

We're still after 50 years using Xerox Alto clones fundamentally. What would a, "modern Alto" play like? What if we spent X,XXX,XXX's of dollars to create a, "in to the future time machine" like they did at PARC? What if the project had the high ethics of Bush and Engelbart as an operational paradigm?

...and yes Lisp is the best programming language. Suck it.

renehsz · 2025-10-08T22:09:58 1759961398

> Such laws cannot be enforced.

Tech companies can certainly be forced to build surveillance into their chat applications and operating systems. This doesn't have to be about backdooring crypto.

> Enforcement can only be arbitrary.

Sure, but it would be forced upon the vast majority of the population. Tech-savvy people will find ways to circumvent it, so will criminals, but that doesn't make mass surveillance of all others any less scary.

jopsen · 2025-10-10T20:34:28 1760128468

The argument was that ChatControl is a threat to the rule of law.

Because enforcement would be so arbitrary.

renehsz · 2025-10-04T19:07:49 1759604869

> It also assumes that the OS doesn't lie to the application when allocations fail.

Gotta do the good ol'

  echo 2 >/proc/sys/vm/overcommit_memory

and maybe adjust overcommit_ratio as well to make sure the memory you allocated is actually available.

pjmlp · 2025-10-04T19:30:11 1759606211

OS specific hack and unrelated to C.

tredre3 · 2025-10-04T20:20:04 1759609204

Your comment was also OS-specific because Windows doesn't lie to applications about failed allocations.

pjmlp · 2025-10-05T08:27:28 1759652848

Not at all, rather there is no guarantee that the C abstract machine described on ISO C, actually returns NULL on memory allocation failures as some C advocates without ISO C legalese expertise seem to advocate.

renehsz · 2025-10-04T18:41:36 1759603296

Yeah, hosting on or at least tunneling through a commercial IP address is definitely required in order not to be flagged as spam. Personally, I chose the latter option of hosting my MTA at home but tunneling its traffic through a VPS in a datacenter. It's been working pretty well ever since, although I'm not sure it's worth the effort versus just using a cheap hosted provider.

the_gipsy · 2025-10-04T23:28:56 1759620536

Is it some kind of free VPS?

layer8 · 2025-10-05T01:26:08 1759627568

VPS are generally paid, but it's only $5 or less a month.

renehsz · 2025-09-17T19:55:38 1758138938

Unfortunately, Memory64 comes with a significant performance penalty because the wasm runtime has to check bounds (which wasn't necessary on 32-bit as the runtime would simply allocate the full 4GB of address space every time).

But if you really need more than 4GB of memory, then sure, go ahead and use it.

Findecanor · 2025-09-17T22:40:13 1758148813

Actually, runtimes often allocate 8GB of address space because WASM has a [base32 + index32] address mode where the effective address could overflow into the 33rd bit.

On x86-64, the start of the linear memory is typically put into one of the two remaining segment registers: GS or FS. Then the code can simply use an address mode such as "GS:[RAX + RCX]" without any additional instructions for addition or bounds-checking.

jsheard · 2025-09-17T20:01:27 1758139287

The comedy option would be to use the new multi-memory feature to juggle a bunch of 32bit memories instead of a 64bit one, at the cost of your sanity.

baq · 2025-09-17T20:14:46 1758140086

didn't we call it 'segmented memory' back in DOS days...?

munificent · 2025-09-17T20:47:04 1758142024

We call it "pointer compression" now. :)

mananaysiempre · 2025-09-17T21:46:40 1758145600

Seriously though, I’ve been wondering for a while whether I could build a GCC for x86-64 that would have 32-bit (low 4G) pointers (and no REX prefixes) by default and full 64-bit ones with __far or something. (In this episode of Everything Old Is New Again: the Very Large Memory API[1] from Windows NT for Alpha.)

[1] https://devblogs.microsoft.com/oldnewthing/20070801-00/?p=25...

o11c · 2025-09-17T22:40:22 1758148822

A moderate fraction of the work is already done using:

https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html

Unfortunately the obvious `__attribute__((mode(...)))` errors out if anything but the standard pointer-size mode (usually SI or DI) is passed.

Or you may be able to do it based on x32, since your far pointers are likely rare enough that you can do them manually. Especially in C++. I'm pretty sure you can just call "foreign" syscalls if you do it carefully.

dajtxx · 2025-09-18T00:52:23 1758156743

6502 zero page instruction vibes.

magicalhippo · 2025-09-17T21:37:47 1758145067

It was glorious I tell you.

Especially how you could increase the segment value by one or the offset by 16 and you would address the same memory location. Think of the possibilities!

And if you wanted more than 1MB you could just switch memory banks[1] to get access to a different part of memory. Later there was a newfangled alternative[2] where you called some interrupt to swap things around but it wasn't as cool. Though it did allow access to more memory so there was that.

Then virtual mode came along and it's all been downhill from there.

[1]: https://en.wikipedia.org/wiki/Expanded_memory

[2]: https://hackaday.com/2025/05/15/remembering-more-memory-xms-...

mananaysiempre · 2025-09-18T00:05:37 1758153937

> Think of the possibilities!

Schulman’s Unauthorized Windows 95 describes a particularly unhinged one: in the hypervisor of Windows/386 (and subsequently 386 Enhanced Mode in Windows 3.0 and 3.1, as well as the only available mode in 3.11, 95, 98, and Me), a driver could dynamically register upcalls for real-mode guests (within reason), all without either exerting control over the guest’s memory map or forcing the guest to do anything except a simple CALL to access it. The secret was that all the far addresses returned by the registration API referred to the exact same byte in memory, a protected-mode-only instruction whose attempted execution would trap into the hypervisor, and the trap handler would determine which upcall was meant by which of the redundant encodings was used.

And if that’s not unhinged enough for you: the boot code tried to locate the chosen instruction inside the firmware ROM, because that will have to be mapped into the guest memory map anyway. It did have a fallback if that did not work out, but it usually succeeded. This time, the secret (the knowledge of which will not make you happier, this is your final warning) is that the instruction chosen was ARPL, and the encoding of ARPL r/m16, AX starts with 63 hex, also known as the ASCII code of the lowercase letter C. The absolute madmen put the upcall entry point inside the BIOS copyright string.

(Incidentally, the ARPL instruction, “adjust requested privilege level”, is very specific to the 286’s weird don’t-call-it-capability-based segmented architecture... But it’s has a certain cunning to it, like CPU-enforced __user tagging of unprivileged addresses at runtime.)

DaiPlusPlus · 2025-09-18T08:26:44 1758184004

> The absolute madmen put the upcall entry point inside the BIOS copyright string.

Isn’t that an arbitrary string, though? Presumably AMI and Insyde have different copyright messages, so then what?

mananaysiempre · 2025-09-18T12:41:02 1758199262

To clarify: when I said that “the boot code tried to locate the chosen instruction inside the firmware ROM”, I literally meant that it looked through the entirety of the ROM BIOS memory range for a byte, any byte, with value 63 hex. There’s even a separate (I’d say prematurely factored out) routine for that, Locate_Byte_In_ROM. It just so happens that the byte in question is usually found inside the copyright string (what with the instruction being invalid and most of the rest of the exposed ROM presumably being valid code), but the code does not assume that.

If the search doesn’t succeed or if you’ve set SystemROMBreakPoint=off in the [386Enh] section of SYSTEM.INI[1] or run WIN /D:S, then the trap instruction will instead be placed in a hypervisor-provided area of RAM that’s shared among all guests, accepting the risk that a misbehaving guest will stomp over it and break everything (don’t know where it fits in the memory map).

As to the chances of failing, well, I suspect the original target was the c in “(c)”, but for example Schulman shows his system having the trap address point at “chnologies Ltd.”, presumably preceded by “Phoenix Te”. AMI and Award were both “Inc.”, so that would also work. Insyde wasn’t a thing yet; don’t know what happened on Compaq or IBM machines. One way or another, looks like a c could be found somewhere often enough that the Microsoft programmers were satisfied with the approach.

[1] https://jeffpar.github.io/kbarchive/kb/071/Q71264/

_nalply · 2025-09-18T12:29:33 1758198573

I thought so, but "Copyright" is always the same? Haha, that's dangerously clever or cleverly dangerous.

marcosdumay · 2025-09-17T23:02:56 1758150176

And turned out we have the transistors to avoid it, but it's a really good optimization for CPUs nowadays.

At least most people design non-overlaping segments. And I'm not sure wasm would gain anything from it, being a virtual machine instead of real.

malkia · 2025-09-17T22:18:03 1758147483

wait.... UNREAL MODE!

andrewl-hn · 2025-09-18T10:38:37 1758191917

Somewhat related. At some point around 15 years ago I needed to work with large images in Java, and at least at the time the language used 32-bit integers for array sizes and indices. My image data was about 30 gigs in size, and despite having enough RAM and running a 64-bit OS and JVM I couldn't fit image data into s ingle array.

This multi-memory setup reminds me of my array juggling I had to do back then. While intellectually challenging it was not fun at all.

the_duke · 2025-09-18T11:32:37 1758195157

The problem with multi-memory (and why it hasn't seen much usage, despite having been supported in many runtimes for years) is that basically no language supports distinct memory spaces. You have to rewrite everything to use WASM intrinsics to work on a specific memory.

benji-york · 2025-09-18T14:19:18 1758205158

Stray thought: the way Zig uses first-class allocators might make it interesting for doing things with multiple memories.

evmar · 2025-09-17T21:43:40 1758145420

It looks like memories have to be declared up front, and the memcpy instruction takes the memories to copy between as numeric literals. So I guess you can't use it to allocate dynamic buffers. But maybe you could decide memory 0 = heap and memory 1 = pixel data or something like that?

afiori · 2025-09-17T21:14:09 1758143649

Honestly you could allocate a new memory for every page :-)

TrueDuality · 2025-09-17T21:19:24 1758143964

The irony for me is that it's already slow because of the lack of native 64-bit math. I don't care about the memory space available nearly as much.

sehugg · 2025-09-17T21:52:10 1758145930

Eh? I'm pretty sure it's had 64-bit math for awhile -- i64.add, etc.

jesse__ · 2025-09-17T22:44:46 1758149086

They might have meant lack of true 64bit pointers ..? IIRC the chrome wasm runtime used tagged pointers. That comes with an access cost of having to mask off the top bits. I always assumed that was the reason for the 32bit specification in v1

fulafel · 2025-09-18T04:00:04 1758168004

Bounds checking in other PLT is often reproted to result in pretty low overheads. Will be interesting to see some details about how this turns out.

zarzavat · 2025-09-17T22:58:55 1758149935

I still don't understand why it's slower to mask to 33 or 34 bit rather than 32. It's all running on 64-bit in the end isn't it? What's so special about 32?

nagisa · 2025-09-17T23:17:22 1758151042

That's because with 32-bit addresses the runtime did not need to do any masking at all. It could allocate a 4GiB area of virtual memory, set up page permissions as appropriate and all memory accesses would be hardware checked without any additional work. Well that, and a special SIGSEGV/SIGBUS handler to generate a trap to the embedder.

With 64-bit addresses, and the requirements for how invalid memory accesses should work, this is no longer possible. AND-masking does not really allow for producing the necessary traps for invalid accesses. So every one now needs some conditional before to validate that this access is in-bounds. The addresses cannot be trivially offset either as they can wrap-around (and/or accidentally hit some other mapping.)

kannanvijayan · 2025-09-18T00:07:30 1758154050

I don't feel this is going to be as big of a problem as one might think in practice.

The biggest contributor to pointer arithmetic is offset reads into pointers: what gets generated for struct field accesses.

The other class of cases are when you're actually doing more general pointer arithmetic - usually scanning across a buffer. These are cases that typically get loop unrolled to some degree by the compiler to improve pipeline efficiency on the CPU.

In the first case, you can avoid the masking entirely by using an unmapped barrier region after the mapped region. So you can guarantee that if pointer `P` is valid, then `P + d` for small d is either valid, or falls into the barrier region.

In the second case, the barrier region approach lets you lift the mask check to the top of the unrolled segment. There's still a cost, but it's spread out over multiple iterations of a loop.

As a last step: if you can prove that you're stepping monotonically through some address space using small increments, then you can guarantee that even if theoretically the "end" of the iteration might step into invalid space, that the incremental stepping is guaranteed to hit the unmapped barrier region before that occurs.

It's a bit more engineering effort on the compiler side.. and you will see some small delta of perf loss, but it would really be only in the extreme cases of hot paths where it should come into play in a meaningful way.

zarzavat · 2025-09-18T09:55:07 1758189307

> AND-masking does not really allow for producing the necessary traps for invalid accesses.

Why does it need to trap? Can't they just make it UB?

Specifying that invalid accesses always trap is going to degrade performance, that's not a 64-bit problem, that's a spec problem. Even if you define it in WASM, it's still UB in the compiler so you aren't saving anyone from UB they didn't already have. Just make the trapping guarantee a debug option only.

_nalply · 2025-09-18T12:37:05 1758199025

It's WASM. WASM runs in a sandbox and you can't have UB on the hardware level. Imagine someone exploiting the behavior of some browser when UB is triggered. Except that the programmer is not having nasal demons [1] but some poor user, like a mom of four children in Abraska running a website on her cell phone.

[1]: http://catb.org/jargon/html/N/nasal-demons.html

zarzavat · 2025-09-18T13:07:22 1758200842

The UB in this case is "you may get another value in the sandboxed memory region if you dereference an invalid pointer, rather than a guaranteed trap". You can still have UB even in a sandbox.

Seems like they got overly attached to the guaranteed trapping they got on 32-bit and wanted to keep it even though it's totally not worth the cost of bounds checking every pointer access. Save the trapping for debug mode only.

_nalply · 2025-09-18T13:18:49 1758201529

Ah, so you meant UB = unspecified behavior, not UB = undefined behavior.

Maybe. Bugs that come from spooky behavior at a distance are notoriously hard to debug, especially in production, and it's worthwile to pay for it to avoid that.

azakai · 2025-09-17T23:09:06 1758150546

The special part is the "signal handler trick" that is easy to use for 32-bit pointers. You reserve 4GB of memory - all that 32 bits can address - and mark everything above used memory as trapping. Then you can just do normal reads and writes, and the CPU hardware checks out of bounds.

With 64-bit pointers, you can't really reserve all the possible space a pointer might refer to. So you end up doing manual bounds checks.

kannanvijayan · 2025-09-18T00:14:42 1758154482

Hi Alon! It's been a while.

Can't bounds checks be avoided in the vast majority of cases?

See my reply to nagisa above (https://news.ycombinator.com/item?id=45283102). It feels like by using trailing unmapped barrier/guard regions, one should be able to elide almost all bounds checks that occur in the program with a bit of compiler cleverness, and convert them into trap handlers instead.

azakai · 2025-09-18T01:35:05 1758159305

Hi!

Yeah, certainly compiler smarts can remove many bounds checks (in particular for small deltas, as you mention), hoist them, and so forth. Maybe even most of them in theory?

Still, there are common patterns like pointer-chasing in linked list traversal where you just keep getting an unknown i64 pointer, that you just need to bounds check...

phire · 2025-09-18T00:05:16 1758153916

Because CPUs still have instructions that automatically truncate the result of all math operations to 32 bits (and sometimes 8-bit and 16-bit too, though not universally).

To operate on any other size, you need to insert extra instructions to mask addresses to the desired size before they are used.

dist1ll · 2025-09-18T03:32:13 1758166333

WASM traps on out-of-bounds accesses (including overflow). Masking addresses would hide that.

renehsz · 2025-09-11T23:37:46 1757633866

This project is very impressive, see also https://news.ycombinator.com/item?id=45182518

renehsz · 2025-09-11T23:36:41 1757633801

Wow, hats off to you! This is one of the most impressive solo projects I've seen in a while!

Making a toy C compiler isn't rocket science, but developing one that's complete and production-ready is whole nother story. AFAICT, Kefir fits into the latter category:

- C17/C23 compliance

- x86_64 codegen

- debug info gen

- SSA-based optimization passes (the most important ones)

- has a widely-compatible cc cli

- is extensively tested/fuzzed

Some advantages compared to the big three (GCC, Clang, MSVC):

- It's fairly small and simple

- That means it's understandable and predictable - no surprises in terms of what it can and cannot do (regarding optimizations in particular)

- Compilation is probably very fast (although I haven't done any benchmarking)

- It could be modified or extended fairly easily

There might be real interest in this compiler from people/companies who

- value predictability and stability very highly

- want to be in control of their entire software supply chain (including the build tools)

- simply want a faster compiler for their debug builds

Think security/high assurance people, even the suckless or handmade community might be interested.

So it's time to market this thing! Get some momentum going! It would be too sad to see this project fade away in silence. Announce it in lots of places, maybe get it on Compiler Explorer, etc. (I'm not saying that you have to do this, of course. But some people could genuinely benefit from Kefir.)

P.S. Seems like JKU has earned its reputation as one of the best CS schools in Austria ;-)

jprotopopov · 2025-09-12T17:29:28 1757698168

(I thought that the announcement has completely faded, so haven't even checked the replies).

I'll immediately reveal some issues with the project. On the compilation speed, it is unfortunately atrocious. There are multiple reasons for that:

1. Initially the compiler was much less ambitious, and was basically generating stack-based threaded code, so everything was much simpler. I have managed to make it more reasonable in terms of code generation (now it has real code generator, optimization pipeline), but there is still huge legacy in the code base. There is a whole layer of stack-based IR which is translated from the AST, and then transformed into equivalent SSA-based IR. Removing that means rewriting the whole translator part, for which I am not ready.

2. You've outlined some appealing points (standard compliance, debug info, optimization passes), but again -- this came at the expense of being over-engineered and bloated inside. Whenever, I have to implement some new feature, I hedge and over-abstract to keep it manageable/avoid hitting unanticipated problems in the future. This has served quite well in terms of development velocity and extending the scope (many parts have not seen complete refactoring since the initial implementation in 2020/2021, I just build up), but efficiency of the compiler itself suffered.

3. I have not particularly prioritized this issue. Basically, I start optimizing the compiler itself only when something gets too unreasonable (either, in terms of run time, or memory). There are all kinds of inefficiencies, O(n^2) algorithms and such simply because I knew that I would be able to swap that part out should that be necessary, but never actually did. I think the compiler efficiency has been the most de-prioritized concern for me.

Basically, if one is concerned with compilation speed, it is literally better to pick gcc, not even talking about something like tcc. Kefir is abysmal in that respect. I terms, of code base size, it is 144k (sans tests, 260k in total) which is again not exactly small. It's manageable for me, but not hacker-friendly.

With respect to marketing, I am kind of torn. I cannot work on this thing full time, unless somebody is ready to provide sufficient full-time funding for myself and also other expenses (machines for tests, etc). Otherwise, I'll just increase the workload on myself and reduce the amount of time I can spend actually working on the code, so it'll probably be net loss for the project. Either way, for now I treat it closer to an art project than a production tool.

As for compiled code performance, I have addressed it here https://lobste.rs/s/fxyvwf -- it's better than, say, tcc, but nowhere near well-established compilers. I think this is reasonable to expect, and the exact ways to improve that a bit are also clear to me, it's only question of development effort

P.S. JKU is a great school, although by the time I enrolled there the project has already been on the verge of bootstrapping itself.

EDIT: formatting

renehsz · 2025-09-13T22:21:29 1757802089

Well, sometimes things aren't as amazing as they look on the surface. And it's totally understandable if you don't want to spend the time and effort to solve the problems you mentioned.

Some people/communities might be interested in this compiler regardless. It doesn't hurt to spread the word. We need more compiler diversity, research, and development.

Also, don't sell yourself short - completing a project of this magnitude and complexity as a solo dev is something few people are able to do. If I worked in HR, I would try to hire you instantly ;-)

Hope to see more awesome software from you in the future!