wait, hangon. There are partial technical answers to a lot of these problems. First, is shit broken all the time? if that's true, that's the real problem. I've been in that situation before, and it's hellish because you know that if you had a week of 'full productivity' you'd be able to fix the root cause, and shit wouldn't be broken all the time, but you are always too wasted from firefighting to solve the root cause.
Hire some contractors or something to help you get things together to the point where you have emergencies once a week or less; This is a relatively low bar, but it will set things up such that you can sleep most of the time, which is key. Even if you have to pay full rates $150/hr or whatever, (and you should be able to get someone good enough to get you down to the 'failures once a week' bar fairly quickly at that rate.) do it. Have your hired gun bang on things for a week. Someone good and fresh should be able to get you down to 1 downtime-causing failure a week fairly quickly. (really. one failure every 7 days is an extremely low bar unless you have hundreds or thousands of physical servers. I probably get a bad disk a week, but those are 'fix as soon as you are awake' not 'wake up now' events. I get a downtime causing error maybe once every two months, and if you can only afford one sysadmin, chances are you are way smaller than I am.)
The important thing is to focus on bringing your failures down to reasonable rather than on making things perfect. Bring your failures down to humanly tolerable, recover, then start worrying about the failures that happen twice a year.
Next, you have a pager, right? sleep whenever you can. The caffeine makes this harder, but it's still possible. Sleep whenever you can. Set it up so that your pager wakes you, but also set up your alarms so that it only goes off if there is a /real/ problem.
Next, why are you staying at the office? I've bought one of those little verizon brand USB cellphone dongles for both my employee and myself. It's pretty great; you can be way out in the middle of nowhere, get a page, and you can do most of the things you could do in the office.
You need to set expectations. If you got woken up to fight a fire?
you put out the fire, but then you aren't showing up the next day.
So yeah, first priority? sleep. Depriving yourself of sleep is false productivity. Next? make your system more reliable.
Hire some contractors or something to help you get things together to the point where you have emergencies once a week or less; This is a relatively low bar, but it will set things up such that you can sleep most of the time, which is key. Even if you have to pay full rates $150/hr or whatever, (and you should be able to get someone good enough to get you down to the 'failures once a week' bar fairly quickly at that rate.) do it. Have your hired gun bang on things for a week. Someone good and fresh should be able to get you down to 1 downtime-causing failure a week fairly quickly. (really. one failure every 7 days is an extremely low bar unless you have hundreds or thousands of physical servers. I probably get a bad disk a week, but those are 'fix as soon as you are awake' not 'wake up now' events. I get a downtime causing error maybe once every two months, and if you can only afford one sysadmin, chances are you are way smaller than I am.)
The important thing is to focus on bringing your failures down to reasonable rather than on making things perfect. Bring your failures down to humanly tolerable, recover, then start worrying about the failures that happen twice a year.
Next, you have a pager, right? sleep whenever you can. The caffeine makes this harder, but it's still possible. Sleep whenever you can. Set it up so that your pager wakes you, but also set up your alarms so that it only goes off if there is a /real/ problem.
Next, why are you staying at the office? I've bought one of those little verizon brand USB cellphone dongles for both my employee and myself. It's pretty great; you can be way out in the middle of nowhere, get a page, and you can do most of the things you could do in the office.
You need to set expectations. If you got woken up to fight a fire? you put out the fire, but then you aren't showing up the next day.
So yeah, first priority? sleep. Depriving yourself of sleep is false productivity. Next? make your system more reliable.