Over the past several weeks, my teammates have all contributed to a very valuable series of blog articles entitled “Disaster Recovery Planning for IT Pros”. They’ve covered topics such as how to get started in planning, and Server Virtualization and how it applies to Disaster Recovery, and testing your recovery plans. And they’ve discussed technologies that can help – the tools in your DR tool belt – such as Hyper-V Replica, Windows Azure, and the newer Windows Azure Hyper-V Recovery Manager.
For the full list of excellent articles, CLICK HERE.
“What is an Offline Backup?”
Before I dive in here, I want to be clear about what I’m covering. For the purpose of this discussion, I’m not talking about tapes or off-site storage of backed up data. That’s something more commonly called an archive. Regular storage and archival for recovery from past history is an important (and big) topic in-and-of-itself; perhaps the topic of another blog series for another day. For this article, however, I’m talking about having a copy of some important digital asset that was saved in a way that can safely and fully be recovered as a complete unit, in case the original location is unable to house that asset. (Yeah.. a disaster.) That digital asset could be a server OS installation, a directory, a database, a virtual machine, a desktop image, file storage, an application; really whatever you consider valuable and worth the effort (and cost) to have protected in a way that can be quickly restored if the worst should happen.
“Do I really need an offline copy these days?”
That’s a fair question. With all of the excellent (and many now built-in and included) technologies in modern operating systems such as Windows Server 2012 R2 , it could be argued that you don’t really need to create backups of some items. A virtual machine will start running on another cluster node if the hosting node fails, and the storage supporting that machine could be on always-available file server cluster (Scale-Out File Server), with redundant paths to the storage, and supporting arrays of disks that, if they or the controllers that support them fail, are redundant and easily replaced even while the data continues to be available. (And I haven’t even touched on application availability within a virtual machine or the benefits of virtualization guest-clustering.)
But even with all of this great technology, not all data or files or applications are equally important, and not all are worth the same amount of investment to ensure their availability and – important to our DR topic – their recovery in case of a really bad thing (disaster) happening.
The case for the offline backup will be determined by these factors:
- The importance of that data (The RPO and RTO)
- The technologies you’re willing to invest in to support continuous availability
“How important is your data?”
As part of the planning process (which Jennelle introduced to you early in our series), you’ll take an inventory of all of your digital assets, and then make a priority list of all of those items. The priority should be in order of MOST critical (i.e. My business can’t function or survive without) , to LEAST critical (no big deal / can rebuild / etc) assets. Now, going on the assumption that at some point your datacenter is turned into a “steaming pile”, you’ll draw a line through your list. Items above the line are critical to your business overcoming the disaster. Items below the line.. not worth the investment in time or effort. (Note: that line will shift up or down as you work through this, as you get into actually figuring out the costs associated with your plan, and importantly as you re-evaluate on a regular schedule your disaster preparedness.)
For each of your inventoried and prioritized digital assets you’re also going to be defining a couple of objectives – the Recovery Point Objective (RPO) and the Recovery Time Objective (RTO).
The RPO is the answer to the question: “How much data can I afford to lose if I need to recovery from an offline copy of that data?” In its simplest terms, it decides how often or how frequently I make a new offline copy of that asset. An example in Hyper-V Replication would be the setting that determines how frequently a new set of changes are replicated to the virtual machine’s replica copy. If I’m replicating changes every 5 minutes, then at most I could lose up to 5 minutes of changes should the worst happen, so my RPO is 5 minutes. Is that good enough? Maybe. It depends on what that virtual machine is actually doing for you.
The RTO describes how long you’re willing to wait to bring that digital asset back online. If I’m still doing all of my backups to tape, and then shipping those tapes offsite, it’s going to take a lot longer to recover at a new location than it would, say, to take advantage of another site and/or Windows Azure to host your stored backups. Can I afford to wait a day or two? An hour? A few minutes? How critical that asset is to your business continuity will help you set a desired RTO for that item.
“How much are you willing to spend?”
Again, there are expensive and there are cheaper options for addressing the RPO and RTO, and you’ll ultimately base how much you’re willing to invest by the relative importance of the digital assets in bringing your company back quickly (or as reasonably quickly) from disaster.
“And once I’ve implemented everything, I’m done?”
Of course not. You’ll regularly test your recovery. And less frequently – but still critically – you’ll occasionally re-evaluate your priority list and the methods for meeting your objectives. Of all people, IT Pros know how quickly technology evolves. What seemed like a good, solid plan and a decent implementation of tools last year may not fit as well today now that newer/better/faster/cheaper options are available. And that’s not even considering the shifting nature of your own environment, the servers, the applications, the growth of data.. all need to be re-considered on a regular basis.
There is definitely a case for offline backup. What and how you do that backup will be defined by you, based on priority, and adjusted by cost. And making those decisions and implementing your plan isn’t the end of the process. You must revisit, re-inventory, re-prioritize, adjust and test your plans on a regular schedule.
This is post part of a 15 part series on Disaster Recovery and Business Continuity planning by the US based Microsoft IT Evangelists. For the full list of articles in this series see the intro post located here: http://mythoughtsonit.com/2014/02/intro-to-series-disaster-recovery-planning-for-i-t-pros/