I'd say the equivalent of Erlang's supervisor trees is what is needed but once y...

ViewTrick1002 · 2025-11-26T18:21:56 1764181316

Or just deploy containers with an orchestrator restarting them when failing?

It is not like an Erlang service would be able to make progress with an invalid config either.

jacquesm · 2025-11-26T18:58:16 1764183496

That's fair, but even there the roll-back would be a lot smoother, besides the supervisor trees are a lot more fine grained than restarting entire containers when they fail.

lenkite · 2025-11-26T18:56:03 1764183363

What happens when they "keep" failing ? You never get to know what is causing your nightmare.

zbentley · 2025-11-27T03:57:31 1764215851

I’m not sure that panic (speaking generally about the majority of its uses and the spirit of the law; obviously 100% of code does not obey this) is the equivalent of an Erlang process crash in most cases. Rather, I think unwrap()/panic are usually used in ways more similar to erlang:halt/1.

jacquesm · 2025-11-27T13:06:15 1764248775

Exactly, but that is kind of the point here. An Erlang 'halt' is something that most Erlang programmers would twig is not what you want in most cases, in most cases you want your process to crash and for the supervisor to restart it if the error is recoverable.

What happened here is systemic: the config file contained an issue severe enough that it precluded the system from running in the first place and unfortunately that caused a runtime error when in fact that validation should have been separate from the actual use. This is where I see the problem with this particular outage. And that makes it an engineering issue much more than a language issue.

Bad configuration files can and do happen, so you take that eventuality into account during systems design.