It's very local here. I'm in the suburbs of Philadelphia, in one of the highest income counties in the state, two blocks from a major hospital, one block from a suburban downtown. Despite that, I've experienced one or two 4-6 hour long power outages per year the past few years. (Mostly correlated with weather.) One outage in June 2025 was 50 hours long!
Many larger homes in this area have whole-house generators (powered by utility natural gas) with automatic transfer switches. During the 50-hour outage, we "abandoned ship" and stayed with someone who also had an outage, but had a whole-house generator.
Other areas just 5-10 miles away are like what you describe: maybe one outage in the past 10 years.
> If something goes wrong, like the pipeline triggering certbot goes wrong, I won't have time to fix this. So I'd be at a two day renewal with a 4 day "debugging" window.
I think a pattern like that is reasonable for a 6-day cert:
- renew every 2 days, and have a "4 day debugging window"
- renew every 1 day, and have a "5 day debugging window"
100%, I've run into this too. I wrote some minimal scripts in Bash, Python, Ruby, Node.js (JavaScript), Go, and Powershell to send a request and alert if the expiration is less than 14 days from now: https://heyoncall.com/blog/barebone-scripts-to-check-ssl-cer... because anyone who's operating a TLS-secured website (which is... basically anyone with a website) should have at least that level of automated sanity check. We're talking about ~10 lines of Python!
Lots of people are speculating that the price spike is AI related. But it might be more mundane:
I'd bet that a good chunk of the apparently sudden demand spike could be last month's Microsoft Windows 10 end-of-support finally happening, pushing companies and individuals to replace many years worth of older laptops and desktops all at once.
I worked in enterprise laptop repair two decades ago — I like your theory (and there's definitely meat there) but my experience was that if a system's OEM configuration wasn't enough to run modern software, we'd replace the entire system (to avoid bottlenecks elsewhere in the architecture).
I have no idea about the number of people this has actually affected, but this is exactly my situation. Need a new workstation with a bunch of RAM to replace my Win10 machine, so I don't really have viable options than paying the going rate.
There's a tradeoff and the assumption here (which I think is solid) is that there's more benefit from avoiding a supply chain attack by blindly (by default) using a dependency cooldown vs. avoiding a zero-day by blindly (by default) staying on the bleeding edge of new releases.
It's comparing the likelihood of an update introducing a new vulnerability to the likelihood of it fixing a vulnerability.
While the article frames this problem in terms of deliberate, intentional supply chain attacks, I'm sure the majority of bugs and vulnerabilities were never supply chain attacks: they were just ordinary bugs introduced unintentionally in the normal course of software development.
On the unintentional bug/vulnerability side, I think there's a similar argument to be made. Maybe even SemVer can help as a heuristic: a patch version increment is likely safer (less likely to introduce new bugs/regressions/vulnerabilities) than a minor version increment, so a patch version increment could have a shorter cooldown.
If I'm currently running version 2.3.4, and there's a new release 2.4.0, then (unless there's a feature or bugfix I need ASAP), I'm probably better off waiting N days, or until 2.4.1 comes out and fixes the new bugs introduced by 2.4.0!
Yep, that's definitely the assumption. However, I think it's also worth noting that zero-days, once disclosed, do typically receive advisories. Those advisories then (at least in Dependabot) bypass any cooldown controls, since the thinking is that a known vulnerability is more important to remediate than the open-ended risk of a compromised update.
> I'm sure the majority of bugs and vulnerabilities were never supply chain attacks: they were just ordinary bugs introduced unintentionally in the normal course of software development.
Yes, absolutely! The overwhelming majority of vulnerabilities stem from normal accidental bug introduction -- what makes these kinds of dependency compromises uniquely interesting is how immediately dangerous they are versus, say, a DoS somewhere in my network stack (where I'm not even sure it affects me).
Of course. They can simply wait to exploit their vulnerability. It it is well hidden, then it probably won't be noticed for a while and so you can wait until it is running on the majority of your target systems before exploiting it.
From their point of view it is a trade-off between volume of vulnerable targets, management impatience and even the time value of money. Time to market probably wins a lot of arguments that it shouldn't, but that is good news for real people.
You should also factor in that a zero-day often isn’t surfaced to be exploitable if you are using the onion model with other layers that need to be penetrated together. In contrast to a supply chain vulnerability that is designed to actively make outbound connections through any means possible.
Thank you. I was scanning this thread for anyone pointing this out.
The cooldown security scheme appears like some inverse "security by obscurity". Nobody could see a backdoor, therefor we can assume security. This scheme stands and falls with the assumed timelines. Once this assumption tumbles, picking a cooldown period becomes guess work. (Or another compliance box ticked.)
On the other side, the assumption can very well be sound, maybe ~90% of future backdoors can be mitigated by it. But who can tell. This looks like the survivorship bias, because we are making decisions based on the cases we found.
I’d estimate the vast majority of CVEs in third party source are not directly or indirectly exploitable. The CVSS scoring system assumes the worst case scenario the module is deployed in. We still have no good way to automate adjusting the score or even just figuring false positive.
The big problem is the Red Queen's Race nature of development in rapidly-evolving software ecosystems, where everyone has to keep pushing versions forward to deal with their dependencies' changes, as well as any actual software developments of their own. Combine that with the poor design decisions found in rapidly-evolving ecosystems, where everyone assumes anyting can be fixed in the next release, and you have a recipe for disaster.
Could always just use a status page that updates itself. For my side project Total Real Returns [1], if you scroll down and look at the page footer, I have a live status/uptime widget [2] (just an <img> tag, no JS) which links to an externally-hosted status page [3]. Obviously not critical for a side project, but kind of neat, and was fun to build. :)
This is unrelated to the cloudflare incident but thanks a lot for making that page. I keep checking it from time to time and it's basically the main data source for my long term investing.
Looks great! Would you have a recommendation for intro materials to help me learn the basics of electronics using CircuitLab? I have a working understanding of signal processing but building an actual circuit without electrocuting myself, not setting my Raspberry Pi on fire, or selecting the right set of components for the simplest DIY project based on spec sheets are a mystery to me.
A favorite of mine and one of the most common ways to generate a pretty high voltage DC. The full wave version pairs well with a center tapped secondary of a resonant transformer.
For fun, playing with Meshtastic https://meshtastic.org/ and contributing to the open source firmware and apps. They have something cool but need lots of help. I've patched 3 memory leaks and had a few other PRs merged already.
For work, https://heyoncall.com/ as the best tool for on-call alerting, website monitoring, cron job monitoring, especially for small teams and solo founders.
I guess they both fall under the category of "how do you build reliable systems out of unreliable distributed components" :)
Many larger homes in this area have whole-house generators (powered by utility natural gas) with automatic transfer switches. During the 50-hour outage, we "abandoned ship" and stayed with someone who also had an outage, but had a whole-house generator.
Other areas just 5-10 miles away are like what you describe: maybe one outage in the past 10 years.
reply