That is a solution, but not the cause. The cause is not having a culture that evaluates failure scenarios. From what I have read:
* Updates are not vetted or sanity checked.
* Updates are not slow-rolled to production.
* Updates are not signed to prevent corruption or alteration.
* Updater does not sanitize or validate inputs.
* Updater does not have a reversion process to previously known good position on faulty boot.
* Updater should mark itself as Unnecessary For Boot on faulty boot at some point.
Finally, its high adoption means it creates a mono-culture. There should be another version built independently where one is running on a machine and another sits in a ready state. If there is a fault in one, it becomes disabled and the second takes over. Good ol' NASA style redundancy.