Hi gang! I know there are some incorrectitudes in there around the inline part of the guide... but if you find others, I'd really appreciate emails or GitHub issues or PRs.
The goal is to have a comprehensive, free, zero ads or tracking, up-to-date guide to C that fits between cppreference and other higher-level books. Target audience is college students who suddenly have to learn C for their OS or Networking course. :)
I also haven't put any C23 stuff in there yet, but that's on deck. I have until December, right? ;)
Thanks a lot for your networking materials. 10 years ago it got me through a hard networking course in university and it showed me how good teaching materials could be. I'm not programming C professionally but your stuff made me a better programmer.
Thanks, Beej! Your guide was really influential for me when I was a young kid in the early 2000s! I was really into MUDs back then and mate a few toy engines using the things I learned in your guide as a basis! Funny thing, I learned years later we lived in the same town at the time. The internet is big and small at the same time.
Beej's guide to network programming is an absolute classic, dating back to at least 2005 (according to the Internet Archive) - very exiting to see them tackle C programming generally now.
I don't remember if I said, but I got started learning this stuff from a networking program someone had written to allow you to play DOOM over the modem, IIRC. So thank you to whomever that was!
And thank you for writing the networking guide. I found it in my early teens when I was interested in learning programming and wanted to go beyond the basics to do some more interesting stuff.
It helped me write some chatroom bots for things like IRC and battle.net in the early 2000s, which fueled the hobby even more and eventually turned it into a very good career. :)
Yeah! Last week I decided I'd like to write a (very) simple HTTP server in C as a learning experiment. The network programming guide was incredibly useful and the code examples gave me a great base to learn from and extend.
Beej’s site is a gem. I remember coming across it in 2010 or so, thinking, yeah this stuff is great but what happens when this guy gets kids or a “real job” and the bit rot sets in?
Here we are, closing in on 15 years later, and it’s both still-great and up to date.
Beej is a great guy. He was my instructor at this boot camp I attended. I learned a little C from him. I can’t really do much with it, but it was fun to learn.
This I modeled off The Turbo C Bible, one of my favorite all-time reference books that still sits on my shelf. I thought its killer feature was examples for everything.
> it is connected to the bare metal in a way that present-day languages are not.
While obviously C is closer to the metal than say, Python, it's far less clear that this is true for a language like Rust† and I am increasingly dubious about the value of any of these high level languages (yes, even C) to learn "How the machine really works". I reckon it's probably more instructive to write a little bit of ARM assembly than to learn enough C to get to the same place today.
† Trivial example: Your CPU arithmetic operations probably natively behave exactly the way Rust's Wrapping<u32> Wrapping<i64> etc. types do. In C the unsigned integer types, once you figure out how big yours are, are similar, but the signed ones are not!
The key difference to me is where you use “probably”. Rust and even WASM reflect a lot of the system behavior that’s become the norm among modern hardware platforms.
But those norms are still not universal and it’s useful to maintain an understanding of other viable architectures. C was born from a time of of extremely high innovation and experimentation in hardware architecture and it’s baked many of those possibilities into its language. Often, those are the very footguns that can make writing commercial software in C something to be considered carefully, especially because performant and safer alternatives like Rust exist.
Yet, using C is still sometimes the right choices for commercial software and remains a unique instrument for privately expanding one’s understanding of system and hardware architecture.
The quoted claim about C doesn’t have to be wrong just to argue that Rust might often be a better choice in practice for commercial bare metal programming on common systems. It’s making a subtly different claim, and the difference can matter.
I think that I'd have said this claim about C (being closer to the metal) was wrong even 10-15 years ago when I was mostly writing C for a living. Ignore the Rust stuff entirely if you prefer.
What you can learn about in C is the C abstract machine, but the abstract machine is not any modern computer. It starts as a greatly simplified model of the PDP-11, a computer which hasn't been commercially relevant in decades.
So, on the one hand it's not much like the machines we own today, yet on the other hand it's also not like any real machine ever because all the complexities the C designers didn't express in their language are absent from the abstract machine. Sometimes this is an intentional deviation to make C simpler than a PDP-11, which is understandable but not so useful. Much worse though are places where they just didn't know.
The actual PDP-11 has specific semantics because it was a real thing, you could try stuff and find out. You could experiment. However the abstract machine is imaginary, and so where its properties were under-specified you can't just experiment there actually is no answer. WG14 (the C language committee) has tried, since it came into existence, to straight out some of this and choose definite answers, but it's a long way from finished.
So on the one hand, C is bad theoretically because it ignores much of what theoretical CS discovered in subsequent decades, but then on the other hand C is bad practically because its abstract machine models how somebody in the 1970s thought a 1970s computer worked, so it's wrong and it's 50 years old.
And again, I write these things as, IMNSHO an expert C programmer. I'm somebody who has found bugs in GNU libc and who wrote their own debugging tools so they could live debug problems which only happened in a production service. I'm not speaking out of ignorance, but out of understanding.
I respect what you're saying here about a "C abstract machine" that's disjoint from any particular real hardware, and that it introduces its own troubles through the abstraction, and I think with agree with all that.
But I'm not sure how to draw that out of your original comment, or even how to interpret the original comment while "ignoring the Rust stuff entirely". That comment seems to be suggesting that Rust exposes its users to the bare metal just as much as C does, so if you take that comparison out, it doesn't seem like it's saying anything at all. C is flawed, and is not even ideal as a window into bare metal (because of its own abstract machine), but Beej noted that it's more connected than any "present-day" language, and that claim does seem to hold.
Well, let's just try ripping all mention of Rust out:
""While obviously C is closer to the metal than say, Python, I am increasingly dubious about the value of any of these high level languages (yes, even C) to learn "How the machine really works". I reckon it's probably more instructive to write a little bit of ARM assembly than to learn enough C to get to the same place today.""
Now, why am I talking about "How the machine really works". Well, here's the next line from Beej:
> When you learn C, you learn about how software interfaces with computer memory at a low level
You don't. You're learning about how C software interfaces with the memory of C's abstract machine, but that's not how your machine's memory works, not even close. For example C just magically ensures the alignment is correct all the time, but the actual memory relies on programmers to ensure they only perform aligned access, unaligned access may blow up your whole program or make it enormously slower, which is worse ?
On the other hand C's Pointers are not just addresses. Even though C makes them look more like they are than many languages with a pointer type, they still aren't just addresses. If you mistake them for addresses, you'll get punished by your compiler, although just exactly when is hard to predict - this is a game of Russian Roulette. However the actual memory in your computer really does just have addresses and they're just numbers like any other numbers.
Amusingly Rust is needing to revisit some of the earlier choices it made that were “obvious” on modern hardware architectures, because we have even newer ones that don’t behave that way.
Here's a recent one: https://faultlore.com/blah/fix-rust-pointers/. The problem there was essentially that Rust assumed sizeof(size_t) == sizeof(intptr_t). If bounds checking makes it into hardware then two's complement will also get thrown out the window.
Because C has both size_t and intptr_t and a billion other magic types which can be adjusted to suit more architectures it was able to successfully "adapt" to CHERI where Rust makes such an adaptation enormously expensive (now all your word size integers are 128-bit).
On the other hand of course, first CHERI doesn't make much sense in Rust, you would not invent all this expensive stuff if your language was memory safe, CHERI exists because people just keep writing C and C++ software and there must be a hardware mitigation - CHERI is that mitigation. If you see a lot more practical CHERI hardware, evidently it was just too hard to give up on these terrible languages.
Secondly though, the main thrust of Aria's article isn't CHERI. It's compelling to have physical hardware that shows off the problem by making usize and isize twice as big, but in practice Aria is talking about provenance which affects your machine and your C programs.
In C provenance is just... not defined. WG14 did I believe actually write (or is in the process of writing) a Technical Report explaining some sort of PNVI-ae variant (Provenance Not Via Integers, Address Exposed) two decades after DR260 but that's not the C language standard it's just a TR, and it's not the final word on the matter.
So Aria's Strict Provenance Experiment isn't Rust trying to catch up to where C was on this, it's Rust trying to do stuff C didn't get to yet in ~50 years of development. And in both languages this is about issues that are completely unrelated to your real hardware. The point here is that addresses which your memory hardware actually has, don't have provenance, but pointers which exist in your programming language do.
CHERI was obviously designed for C and C++, but that doesn't mean it doesn't make sense to use with Rust. Capabilities are generally useful primitives, and allow you to do things like eliminate bounds checks because they are not handled by the hardware.
Provenance is kind of weird in that it is very underspecified and compilers have mostly been flying by "do whatever feels right" for many years, which has mostly worked until recently, when people decided to really sit down and hammer out proper rules for the things people expect to work. CHERI has accelerated those efforts because whatever gets picked for provenance is going to be implemented on that platform and they would like capabilities to line up with whatever they decide. In a sense, this is the opposite problem, where C decided to not really specify behavior at all and now needs to clean up its act, rather than overspecifying and being unable to adapt to new hardware. It's good that it's being cleaned up but mostly aligned with what people have already been doing.
> Capabilities are generally useful primitives, and allow you to do things like eliminate bounds checks because they are not handled by the hardware.
Mmm. What does the actual performance look like? We can surprisingly often trim the actual performance of real software to exactly the same as no bounds checks, but while being sure it's in bounds, and so any approach that's more expensive needs a justification. In C and C++ that justification is pretty obvious, in Rust it's a tougher sell.
If it helps WUFFS can emit high performance C number crunching code that is definitely bounds safe, regardless of whether it includes what appears to be any actual bounds checking, if CHERI exhibits overhead for that code compared to no CHERI - which I think it does, that's a price for CHERI that isn't giving any benefit and you'd expect similar for a high quality Rust port to a platform with CHERI modulo Aria's suggested way forward.
> Provenance is kind of weird in that it is very underspecified and compilers have mostly been flying by "do whatever feels right" for many years, which has mostly worked until recently
I don't agree that it "mostly worked". Until Rust the main consumers of optimising compilers which heavily rely on provenance shenanigans were C and C++ and in both languages you could just scare off the repeated bug reports resulting from such shenanigans by pointing at DR/260 in which WG14 basically says "Yes pointer provenance is a thing" but declines to specify how it could possibly work and leaves it to compilers to do whatever they want. Those bug reports do exist, they just got closed NOTABUG.
Only Rust finally starts asking what the LLVM IR actually means and there are a lot of awkward questions resulting from that, continuing to this day. I think it would have been a lot easier to keep kicking the ball into the long grass (as WG21 agreed to do yet again this year for C++ itself) if there wasn't a language with a whole lot of users asking these awkward questions.
Hard to say, production hardware isn’t out yet. Anyways, I think you misunderstand my point, CHERI is definitely designed for running C/C++ code but obviously people will want to run Rust on it. In that scenario capabilities are “free” because they exist by default (just like MMUs give you protection for no cost because they’re always on for modern OSes), so Rust can work with it.
> If it helps WUFFS can emit high performance C number crunching code that is definitely bounds safe
WUFFS is cool but it’s not really meant for general-purpose code.
> Until Rust the main consumers of optimising compilers which heavily rely on provenance shenanigans were C and C++ and in both languages you could just scare off the repeated bug reports resulting from such shenanigans by pointing at DR/260 in which WG14 basically says "Yes pointer provenance is a thing" but declines to specify how it could possibly work and leaves it to compilers to do whatever they want.
I think this lacks context. The compilers of yesteryear and even mostly the compilers of today can’t do much with provenance, because doing optimizations on it often requires global program analysis that is infeasible to perform. The examples that people knew were problematic were almost all centered around surprising compilation results in small snippets showing things like “off the end” pointers having the same bit pattern as the address of the stack object next to it, but comparing different. The responses to bug reports like these was not that provenance was broken but mostly “the standard doesn’t really want you doing stuff like this anyways, so go away”.
The actual legitimate use of provenance mostly did work, which was stuff like pointer tagging and stuffing (requiring round trips through various integer representations) for language runtimes and the like. Authors of that code typically worked really close to the hardware to questions like “does this pointer have provenance” doesn’t really mean much to them, they’d expect that it compiled to a load instruction like they expected and for the most part compilers would play it conservative and assign arbitrary provenance to operations like that. The exploration of “hmm, maybe we should specify this better” is a recent phenomenon that was done mostly independently of Rust the language but obviously a lot of those people were involved because the compiler/LLVM community is comprised of Rust people too.
I’m no rustacean, but I agree - for vastly different reasons, though.
A type having similar algebraic properties to a set of assembly instructions isn’t what being close to the metal is about. It isn’t a thought experiment or learning experience - it is the ability to interact directly with lower level parts of the system, including the kernel and the hardware. It’s a great power that comes with great responsibility - a power and responsibility that safe rust will never, ever grant you.
Unsafe rust is essentially the same as C++ in terms of level of control. It allows volatile memory access, for example, and it allows embedding asm blocks. It basically gives you the abilities of C with additional tooling and compile time power, which is why I compare it to C++.
Safe rust would be quite far away from the metal. It provides a safety net that prohibits you from doing anything metal-like. The lower you go, the more “trust me, bro” and “hold my beer” you’ll need to incorporate, and safe rust is designed specifically to void bro-trusting, beer-holding situations. A kernel or a driver written in safe rust will never be possible.
But that’s where my terminology falls apart. Rust isn’t a choice of safe and unsafe, it is and will always be a dance between the two.
> A type having similar algebraic properties to a set of assembly instructions isn’t what being close to the metal is about.
See if you can guess how those "similar [actually, identical] algebraic properties" are achieved when the Rust software is compiled to machine code. Keep in mind that Rust has a reputation for going real fast...
If you guessed that the answer is it produces exactly the same machine code, you're correct.
> Safe rust would be quite far away from the metal. It provides a safety net that prohibits you from doing anything metal-like.
Nope. This is a crucial misunderstanding, it supposes that Safe Rust is something like Java or even Python and it really isn't.
It's perfectly safe to implement wrapping integers using just the actual CPU's wrapping arithmetic instructions and so that's exactly what Rust does.
Very often the machine code emitted for a nice safe, expressive Rust program is identical to the code for the trickier to get right C program that looks more "bare metal" - if it copes with all the edge cases correctly. Example: C code that properly checks whether the thing it just got wasn't NULL, versus Rust code that writes if Some(thing) = ... thus pattern matching the "it's a thing" check. If you forget to check in the C code it faults (or worse) at runtime, same error in the Rust won't type check and doesn't compile. Once you've made both of them check the machine code when you compile them is identical, that if Some(thing) check will compile to the same conditional jump as the C not-NULL check.
Sometimes there's less machine code... for the Rust, because the compiler knows Rust isn't stupidly aliasing mutable variables, so it can avoid some spills that must be generated in C "just in case".
Close to the metal doesn’t mean “I can create an operation that I could do in assembly, and it compiles to that assembly”. It doesn’t necessarily mean “fast”, either. If that were the case all compiled and optimised native languages would count. We aren’t talking about integer addition here - we’re talking about interaction with hardware.
The general issue with safe rust in these circumstances is exactly what makes it safe in the first place - it wants to be firmly in the driving seat. Want to access memory? You can only do so if it is being “managed” by rust (an abuse of terms, really, as that management is done at compile time). But with device drivers, there’s a pretty good chance you’ll want to access memory at a certain fixed address, managed outside of your program, and there’s a pretty good chance that it is volatile.
Safe rust has already slammed on the brakes, exited the driver’s seat and is running in the rain just to get away from the thought of such things. Unsafe rust could easily manage it, otherwise embedded rust wouldn’t exist.
For MMIO loads and stores you want intrinsics which do an MMIO load / store, which is exactly what Rust provides. The fact this is an "address" in "memory" couldn't matter less, it's not really memory, that's just for the convenience of the CPU so that we don't need two extra instructions (and I believe ironically on some embedded CPUs we do have special instructions and we do need to use them for IO to get the desired behaviour, so Rust ends up making that case easier...)
What C does here instead is a type system hack named "volatile" types. Let's label the type so that we know it's not really memory, but then we'll allow all the same operations as for regular types even though that doesn't make much sense. Ideally programmers avoid using any of the resulting nonsense operations, in practice they mostly seem to just hope it magically works, after all the C looks like it should work...
Yeah, I don't think C can teach you assembly. But I think it's a very useful tool for comparing the relatively high-level C code to the corresponding assembly code.
Rust, sure--in some cases, when you're using it in a C-like sense, could serve the same purpose.
Is there something like a changelog to this? I remember years ago people on freenode #C pointing out tons of mistakes in this [1] and trashing it as a reference to learn from but it seems like it's gone through a decent amount of revision at this point.
Yeah, any history before git is lost aside from what you might find on archive.org.
And I started writing it ages ago, then shelved in it a half-done state. And then it got that probably well-deserved review.
But its former self is barely recognizable since it has been heavily rewritten and expanded over the past 3 years, so I hope the iso-9899.info folks give me a regrade at some point in the future.
I just have to say thanks to beej71 for making this guide. I'm doing my Masters in CS right now (I'm predominately a web developer... js, java, etc) and only used C++ one time in my undergrad. Your guide was my go to resource when doing things like an echo server/client and implementing a barebones http protocol. Can't thank you enough.
I was actually just looking for a C book. I already "know" (I use the term loosely) C but have little insight into how to write real code, and any best practices.
I agree. The Guide doesn't give many real-world examples (there are zillions of those already out in the real world). It's about how to swing a hammer and turn a screwdriver, not how to build a house.
That said, I try to make K&R-idiomatic examples when I can. They're just small-scale.
It’s static as it will always resolve to the same function. An extern function depends on whatever you link against (or users link against if you provide a static library).
Woah! I saw this on HN a while back, but forgot to bookmark it and lost it. I recognized the name instantly, glad to find it again, and will make sure to save for a weekend where I hate myself sufficiently to dive into this massive document!
I just want to go on record as saying there will never be a *comprehensive* Beej's Guide to Rust. There will never be another comprehensive Beej's Guide to Any Language, for that matter. I've learned my lesson. :)
C is one of the most simple languages there is, and it's YOUGE. There's a good reason other books skim over the library functions. The only saving grace is that C is relatively mature and (hopefully) doesn't change much every decade. Rust is changing so quickly it'd be a near-full-time job just to keep up.
Maybe I'll write some blog posts about Rust sometime as I improve with it. I think the language is all kinds of fun.
And the free Rust books and tutorials that are out there are already awesome.
Tell me you use Hacker News without telling me you use it hahahaha.
But seriously, I'm not sure of any kind of material like this for Rust, but I believe that's because it's simply not needed since we already have "the book" [0], rustlings [1], and rust by example [2]. Honestly, learning Rust is just absurdly simple and straightforward if you have the motivation to do so, these guides are so good I don't see much of a good reason to develop different ones.
As someone who helped write one of those guides… I would encourage people to write even more. Different texts can help different people. The more the merrier!
This is exactly what I have found. You can find 10+ guides on networking and usually it's 1 chapter from this author, 1 from this author, 2 from over here... the perspectives or writing styles collage together to present me with the thing that helps the material "click". Enough exposure and you become capable :)
Just wanted to say thank you. your content was crucial in getting me to understand network programming and was crucial for my masters thesis. Your content has always been crisp, to the point and accesible. Keep it up!
The goal is to have a comprehensive, free, zero ads or tracking, up-to-date guide to C that fits between cppreference and other higher-level books. Target audience is college students who suddenly have to learn C for their OS or Networking course. :)
I also haven't put any C23 stuff in there yet, but that's on deck. I have until December, right? ;)