Red Black Trees
In a good organization, people measure twice and cut once. For example, an idea is born: let's create a data center that is set up properly. First, you figure out how much equipment is needed, how much space is required and how much power is needed to run and cool it. Next, you size back-up batteries and fuel-powered generators to provide uninterruptible power. And so forth.
In a good organization, each of these tasks is designed, reviewed, built, tested and verified, and approved. These things need to be right. Not sort-of right, but right!
Close only counts in horseshoes, hand grenades and thermonuclear war.
Here's a tale of an organization doing almost everything right... almost.
In the late noughties, Mel was working at something that wasn't yet called DevOps at a German ISP. It was a pretty good place to work, in a spanking new office near the French border, paid for by several million customers, a couple of blocks from one of the region's largest data centers that housed said customers' mail and web sites. The data center had all kinds of fancy security features and of course a state-of-the-art UPS. 15 minutes worth of batteries in the basement and a couple of fat diesels to take it from there, with enough fuel to stay on-line, in the true spirit of the Internet, even during a small-time nuclear war. Everything was properly maintained and drills were frequently held to ensure stuff would actually work in case they were attacked or lightning hit.
The computing center only had a few offices for the hardware guys and the core admin team. But as you don't want administrator's root shells to be disconnected (while they were in the middle of something) due to a power outage either, they had connected the office building to the same UPS. And so as not to reduce the backup run time unnecessarily, there were differently-colored outlets: red for the PCs, monitors and network hardware, and gray for coffee makers, printers and other temporarily dispensable devices that wouldn't need a UPS.
Now Germany happens to be known as one of the countries with the best electric grid in the world. Its "Customer Average Interruption Duration Index" is on the order of 15 minutes a year and in some places years can pass without so much as a second of blackout. So the drills were the only thing that had happened since they moved into the office, and not being part of the data center, they weren't even involved in testing. The drills were frequent and pervasive; all computer power cut over to batteries, then generators, and it was verified at the switch that all was well. Of course, during the tests, land-line power was still present in the building on the non-UPS-protected circuits, so nothing actually ever shut off in the offices, which was kind of the whole point of the tests.
When it inevitably hit the fan in the form of an exploding transformer in a major substation, and plunged a quarter million people into darkness, the data center kept going just fine. The admins would probably have noticed a Nagios alert about discharging batteries first, then generators spinning up and so forth. The colleagues in their building hardly noticed as they had ongoing power.
However, on Mels' floor, the coffee maker was happily gurgling along in the silence that had suddenly fallen when all the PCs and monitors went dark.
It turned out that their floor had been wired with the UPS grid on the gray outlets from the beginning and nobody had ever bothered to test it.
[Advertisement] Otter enables DevOps best practices by providing a visual, dynamic, and intuitive UI that shows, at-a-glance, the configuration state of all your servers. Find out more and download today!