Over the years I've been engaged in several AD disaster recovery scenarios where things ultimately boiled down to the same root cause; a single point of failure had been introduced into the IT environment. When the single point of failure failed catastrophically - it consequently took down the entire environment with it.
With good backups that can be restored to recover this may not be an End of Days scenario - but as the 3rd principle of Murphy's Law dicates chances are the backups available are either unusable, unrestorable or non-existent when you actually need them (in the same sense that they will always work when you don't need them).
Now... in most virtualization scenarios the admin responsible for the virtual server is typically completely removed from the storage layer - this has been a conscious push by most of the virtualization providers as part of the drive towards virtualization being intended to simplify IT environments by making the storage medium unimportant.
In itself that may be a valid selling point - but the thing is that even if the Admin is removed from the storage medium the virtual hard disk of the virtual server still needs to be physically stored somewhere. Even the Cloud has mechanical moving parts...
For large hosting providers this "somewhere" is typically a centralized SAN with redundant gizmos, thingamagicks and bells and whistles.
SAN's are tried and tested storage devices that have been around for years before the idea of using them to store virtual machine images was ever conceived - but with today's storage capacity by far outweighing today's backup or restore capability and the cost of a decent SAN with full redundancy being relatively high it becomes very tempting to build SAN's that are large enough to hold the entire mass of virtual machines you are hosting to save money and increase ROI from that SAN.
Consider the following hypothetical but all too likely disaster recovery scenario in today's SAN-based virtualization environments:
- You store 2000 virtual machines on the same SAN.
- The SAN fails catastrophically
Even at best, with a perfect backup strategy in place, a bulletproof Disaster Recovery plan and a small army of trained ninjas that spring from the shadows and start Disaster Recovery procedures at the very instant that the failure has been detected and quantified.... you're still looking at a lengthy recovery process.
If you're missing one of these...you're looking at an even longer process.
Morale: Every Cloud has a Silver lining - even Private Clouds 🙂