Very quick rollout is crucial for this kind of service. On top of what you wrote...

vablings · 2025-11-26T22:57:15 1764197835

It does seem insane to me that there isnt a process to catch the panic, unwind back to a reasonable place in the call stack, load the last known good configuration and continue execution as normal. You would go from having a global 2 hour outage to a warning on a dashboard that can be investigated in a timely manner rather than blowing up half the internet