Written by Manny Sandhu, Microsoft Premier Field Engineer
Outages are something that we never want to have happen, but they do, and when they occur how ready are you to deal with the repercussions?
In most cases, it isn’t the technical skill of personnel which determines the success (or duration) of the outage. How confident are you that you or your staff can recover? Little things add up and in the case of a disaster, minutes can seem like hours. Do you know where the latest backup is located? Do you know the prerequisites to recover? How long does it take to restore? Do you need to involve other teams for recovery?
What constitutes a disaster, and what are the costs?
A very important question to ask is how much does an outage cost? The true cost of a disaster will vary based on various factors:
- Revenue lost due to the outage.
- Productivity lost due to the outage.
- Soft costs such as loss of customer confidence.
What classifies as a disaster? I’ve visited many different companies over my years and asked the question ‘what is a disaster?’ I always receive different answers. The definition of a disaster varies from company to company, however some common causes of disasters are:
- Human Error
How to minimize the risk and impact associated with disasters
In my experience, 95% of disasters fall under the Human Error category, while the majority of the remaining occurences fall into the hardware/software category. Some very simple steps can be taken to minimize the risk.
- Proper testing and deployment of patches
- Minimizing the number of users with administrative/elevated rights
In the event of a disaster, communication to the business is vital. Communications should include:
- A clear description of the issue
- The scope of users/system affected
- An estimated time to recovery. The estimated time to recover should be as accurate as possible so other contingency plans can be implemented.
A disaster recovery plan for each technology is a must for businesses of any size. A technology specific Disaster Recovery plan should be part of a larger disaster recovery and business continuity plan. This plan should be thorough enough for any member of the IT staff to follow, and tested periodically to keep the information fresh and updated after every major product release.