Claude code with sonnet was pretty good but needed a lot of back and forth to get to the right solution. Opus feels closer to a colleague, maybe not your absolute best colleague, but far from your worst.
It's very fun to try to make that limited data model do more than it was intended to.
When I did a little bit of contracting for TigerBeetle, I was working on these learning exercises and came up with some fun ingredients that could be used if you're trying to push what's possible https://github.com/tigerbeetle/tigerlings/pull/2
Indexing into arrays and vectors is really wise to avoid.
The same day Cloudflare had its unwrap fiasco, I found a bug in my code because of a slice that in certain cases went past the end of a vector. Switched it to use iterators and will definitely be more careful with slices and array indexes in the future.
Was it a fiasco? Really? The rust unwrap call is the equivalent to C code like this:
int result = foo(…);
assert(result >= 0);
If that assert tripped, would you blame the assert? Of course not. Or blame C? No. If that assert tripped, it’s doing its job by telling you there’s a problem in the call to foo().
You can write buggy code in rust just like you can in any other language.
I think it's because unwrap() seems to unassuming at a glance. If it were or_panic() instead I think people would intuit it more as extremely dangerous. I understand that we're not dealing with newbies here, but everyone is still human and everything we do to reduce mistakes is a good thing.
I'm sure lots of bystanders are surprised to learn what .unwrap() does. But reading the post, I didn't get the impression that anyone at cloudflare was confused by unwrap's behaviour.
If you read the postmortem, they talk in depth about what the issue really was - which from memory is that their software statically allocated room for 20 rules or something. And their database query unexpected returned more than 20 items. Oops!
I can see the argument for renaming unwrap to unwrap_or_panic. But no alternate spelling of .unwrap() would have saved cloudflare from their buggy database code.
Looking at that unwrap as a Result<T> handler, the arguable issue with the code was the lack of informative explanation in the unexpected case. Panicking from the ill-defined state was desired behaviour, but explicit is always better.
The argument to the contrary is that reading the error out-load showed “the config initializer failing to return a valid configuration”. A panic trace saying “config init failed” is a minor improvement.
If we’re gonna guess and point fingers, I think the configuration init should be doing its own panicking and logging when it blows up.
First, again, that’s not the issue. The bug was in their database code. Could this codebase be improved with error messages? Yes for sure. But that wouldn’t have prevented the outage.
Almost every codebase I’ve ever worked in, in every programming language, could use better human readable error messages. But they’re notoriously hard to figure out ahead of time. You can only write good error messages for error cases you’ve thought through. And most error cases only become apparent when you stub your toe on them for real. Then you wonder how you missed it in the first place.
In any case, this sort of thing has nothing to do with rust.
It's not unassuming. Rust is superior to many other languages for making this risky behaviour visually present in the code base.
You can go ahead and grep your codebase for this today, instead of waiting for an incident.
I'm a fairly new migrant from Java to C#, and when I do some kind of collection lookup, I still need to check whether the method will return a null, throw an exception, expect an out+variable, or worst of all, make up some kind of default. C#'s equivalent to unwrap seems to be '!' (or maybe .Val() or something?)
Whether the value is null (and under which conditions) is encoded into the nullability of return value. Unless you work with a project which went out of its way to disable NRTs (which I've sadly seen happen).
> I think it's because unwrap() seems to unassuming at a glance. If it were or_panic() instead I think people would intuit it more as extremely dangerous.
Anyone who has learned how to program Rust knows that unwrap() will panic if the thing you are unwrapping is Err/None. It's not unassuming at all. When the only person who could be tripped up by a method name is a complete newbie to the language, I don't think it's actually a problem.
Similarly, assert() isn't immediately obvious to a beginner that it will cause a panic. Heck, the term "panic" itself is non obvious to a beginner as something that will crash the program. Yet I don't hear anyone arguing that the panic! macro needs to be changed to crash_this_program. The fact of the matter is that a certain amount of jargon is inevitable in programming (and in my view this is a good thing, because it enables more concise communication amongst practitioners). Unwrap is no different than those other bits of jargon - perhaps non obvious when you are new, but completely obvious once you have learned it.
I don't think you can know what unwrap does and assume it is safe. Plus warnings about unwrap are very common in the Rust community, I even remember articles saying to never use it.
I have always been critical of the Rust hype but unwrap is completely fine. Is an escape hatch has legitimate uses. Some code is fine to just fail.
It is easy to spot during code review. I have never programmed Rust professional and even I would have asked about the unwrap in the cloudfare code if I had reviewed that. You can even enforce to not use unwrap at all through automatic tooling.
The point is Rust provides more safety guarantees than C. But unwrap is an escape hatch, one that can blow up in your face. If they had taken the Haskell route and not provide unwrap at all, this wouldn't have happened.
Haskell’s fromJust, and similar partial functions like head, are as dangerous as Rust’s unwrap. The difference is only in the failure mode. Rust panics, whereas Haskell throws a runtime exception.
You might think that the Haskell behavior is “safer” in some sense, but there’s a huge gotcha: exceptions in pure code are the mortal enemy of lazy evaluation. Lazy evaluation means that an exception can occur after the catch block that surrounded the code in question has exited, so the exception isn’t guaranteed to get caught.
Exceptions can be ok in a monad like IO, which is what they’re intended for - the monad enforces an evaluation order. But if you use a partial function like fromJust in pure code, you have to be very careful about forcing evaluation if you want to be able to catch the exception it might generate. That’s antithetical to the goal of using exceptions - now you have to write to code carefully to make sure exceptions are catchable.
The bottom line is that for reliable code, you need to avoid fromJust and friends in Haskell as much you do in Rust.
The solution in both languages is to use a linter to warn about the use of partial functions: HLint for Haskell, Clippy for Rust. If Cloudflare had done that - and paid attention to the warning! - they would have caught that unwrap error of theirs at linting time. This is basically a learning curve issue.
> The difference is only in the failure mode. Rust panics, whereas Haskell throws a runtime exception.
Fun facts: Rust’s default panic handler also throws a runtime exception just like C++ and other languages. Rust also has catch blocks (std::panic::catch_unwind). But its rarely used. By convention, panicking in rust is typically used for unrecoverable errors, when the program should probably crash. And Result is used when you expect something to be fallable - like parsing user input.
You see catch_unwind in the unit test runner. (That’s why a failing test case doesn’t stop other unit tests running). And in web servers to return 50x. You can opt out of this behaviour with panic=abort in Cargo.toml, which also makes rust binaries a bit smaller.
The difference is not just convention. You mentioned some similarities between Rust panics and C++ exceptions, but there are some important differences. If you tried to write Rust code that used panics and catch_unwind as a general exception mechanism, you’d soon run into those differences, and find out why Rust code isn’t written that way.
The key difference is that in the general case, panics are designed to lead to program termination, not recovery. Examples like unit tests are a special case - the fact that handling panics work for that case doesn’t mean they would work well more broadly.
The point you mentioned, about being able to configure panics to abort, is another issue. If you did that in a program which used panics as an exception handling mechanism, the program would fail on its first exception. Of course you can say “just don’t do that”, but the point is it highlights the difference in the semantics of panics vs. exceptions.
Also, panics are not typed, the way exceptions are in C++ or Java. Using them as a general exception handling mechanism would either be very limiting, or require the design of a whole infrastructure for that.
The are other issues as well, including behavior related to threads, to FFI, and to where panics can even be caught.
I forgot about fromJust. On the other hand, fromJust is shunned by practically everybody writing Haskell. `unwrap` doesn't have the same status. I also understand why. Rust wanted to be more appealing, not too restrictive while Haskell doesn't care about attracting developers.
It's not just fromJust, there many other partial functions, and they all have the same issue, such as head, tail, init, last, read, foldl1, maximum, minimum, etc.
It's an overstatement to say that these are "shunned by practically everybody". They're commonly used in scenarios where the author is confident that the failure condition can't happen due to e.g. a prior test or guard, or that failure can be reliably caught. For example, you can catch a `read` exception reliably in IO. They're also commonly used in GHCi or other interactive environments.
I disagree that the Rust perspective on unwrap is significantly different. Perhaps for beginning programmers in the language? But the same is often true of Haskell. Anyone with some experience should understand the risks of these functions, and if they don't, they'll eventually learn the hard way.
One pattern in Rust that may mislead beginners is that unwrap is often used on things like builders in example docs. The logic here is that if you're building some critical piece of infra that the rest of the program depends on, then if it fails to build the program is toast anyway, so letting it panic can make sense. These examples are also typically scenarios where builder failure is unusual. In that case, it's the author's choice whether to handle failure or just let it panic.
Haskell is far more dangerous. It allows you to simple destruct the `Just` variant without a path for the empty case, causing a runtime error if it ever occurs.
> The point is Rust provides more safety guarantees than C. But unwrap is an escape hatch
Nope. Rust never makes any guarantees that code is panic-free. Quite the opposite. Rust crashes in more circumstances than C code does. For example, indexing past the end of an array is undefined behaviour in C. But if you try that in rust, your program will detect it and crash immediately.
More broadly, safe rust exists to prevent undefined behaviour. Most of the work goes to stopping you from making common memory related bugs, like use-after-free, misaligned reads and data races. The full list of guarantees is pretty interesting[1]. In debug mode, rust programs also crash on integer overflow and underflow. (Thanks for the correction!). But panic is well defined behaviour, so that's allowed. Surprisingly, you're also allowed to leak memory in safe rust if you want to. Why not? Leaks don't cause UB.
You can tell at a glance that unwrap doesn't violate safe rust's rules because you can call it from safe rust without an unsafe block.
I never said Rust makes guarantees that code is panic-free. I said that Rust provides more safety guarantees than C. The Result type is one of them because you have to handle the error case explicitly. If you don't use unwrap.
Also, when I say safety guarantees, I'm not talking about safe rust. I'm talking about Rust features that prevent bugs, like the borrow checker, types like Result and many others.
Ah thanks for the clarification. That wasn’t clear to me reading your comment.
You’re right that rust forces you to explicitly decide what to do with Result::Err. But that’s exactly what we see here. .unwrap() is handling the error case explicitly. It says “if this is an error, crash the program. Otherwise give me the value”. It’s a very useful function that was used correctly here. And it functioned correctly by crashing the program.
I don’t see the problem in this code, beyond it not giving a good error message as it crashed. As the old joke goes, “Task failed successfully.”
This is the equivalent of force-unwrap in Swift, which is strongly discouraged. Swift format will reject this anti-pattern. The code running the internet probably should not force unwrap either.
Funny, it's really the same thing, why Rust people say we should abandon C. Meanwhile in C, it is also common to hand out handle instead of indices precisely due to this problem.
It's pretty similar, but writing `for item in container { item.do_it() }` is a lot less error prone than the C equivalent. The ha-ha-but-serious take is that once you get that snippet to compile, there's almost nothing you could ever do to break it without also making the compiler scream at you.
In rust, handing out indexes isn’t that common. It’s generally bad practice because your program will end up with extra, unnecessary bounds checks. Usually we program rust just the same as in C - get a reference (pointer) to an item inside the array and pass that around. The rust compiler ensures the array isn’t modified or freed while the pointer is held. (Which is helpful, but very inconvenient at times!)
It ranks articles by how closely related they are to your interests. You can import a set of RSS feeds or scour all 15,000+ sources.
I built it because I wanted to find the good articles among noisy feeds like HN Newest. I've also avoided RSS readers in the past because of that feeling of having thousands of unread emails.
I built Scour to help me sift through noisy sources like HN Newest. For each article in my Scour feed, I can click the Show Feeds button to find what other sources that post shows up in. I’ve found that to be quite a nice way of discovering people’s blogs that I wouldn’t have found otherwise.
You can also scour all 14,000+ sources for posts that match your interests.
If you’re looking to put one up, try https://bearblog.dev (no connection, just appreciate Herman’s work).
It’s got just the features you need, is built by a solo dev, and it’s got a very fair split between free and paid features. I used it to put up my personal site and have been very happy with the experience.
Agreed. For a Rust project, running Clippy and rustfmt is slow, but I’d be surprised to learn that pre-commit itself was a non-negligible part of that.
Very interesting! I especially appreciated the test of running models against the same benchmark from the following year and the point about the per-token discount being negated by models needing more tokens to get to the answer.
Generalization:
> Maybe Chinese models generalise to unseen tasks less well. (For instance, when tested on fresh data, 01’s Yi model fell 8pp (25%) on GSM - the biggest drop amongst all models.)
> We can get a dirty estimate of this by the “shrinkage gap”: look at how a model performs on next year’s iteration of some task, compared to this year’s. If it finished training in 2024, then it can’t have trained on the version released in 2025, so we get to see what they’re like on at least somewhat novel tasks. We’ll use two versions of the same benchmark to keep the difficulty roughly on par. Let’s try AIME:
> Almost all models get worse on this new benchmark, despite 2025 being the same difficulty as 2024 (for humans). But as I expected, Western models drop less: they lost 10% of their performance on the new data, while Chinese models dropped 21%. p = 0.09.
> Averaging across crappy models for the sake of a cultural generalisation doesn’t make sense. Luckily, rerunning the analysis with just the top models gives roughly the same result (9% gap instead of 11%).
Cost-effectiveness:
> Distinguish intelligence (max performance), intelligence per token (efficiency), and intelligence per dollar (cost-effectiveness).
> The 5x discounts I quoted are per-token, not per-success. If you had to use 6x more tokens to get the same quality, then there would be no real discount. And indeed DeepSeek and Qwen (see also anecdote here about Kimi, uncontested) are very hungry.
Extremely helpful. I've been eagerly awaiting v0.5 but have been holding off on deploying it until I had more confidence that it would work and be stable. Reading this, I'm definitely glad that I waited.
Claude code with sonnet was pretty good but needed a lot of back and forth to get to the right solution. Opus feels closer to a colleague, maybe not your absolute best colleague, but far from your worst.
reply