We tout the virtues of three nines, four nines, and five nines… I even heard someone use six nines recently! We want our data centers to achieve the best possible up time, but what does that really mean? We need to be able to help our customers understand not only what five nines means, but what it means to them. Keep in mind is that one nine could be good enough for most businesses as long as all of the down time occurs after hours <smirk>. If it was only that easy…
Back to the question of What does Five Nines mean? Five Nines means that of the 2,592,000 seconds in a month, your Server is available 2,591,974.08 seconds of that month. To make it easier to read and remember, 99.999% uptime means that your Server is only down 25.92 seconds in month.
I checked out a few sources for a definition. Here’s what wikipedia has to say:
While we can measure the uptime of a server, the services are not necessarily tied to a particular server anymore, are they? We can measure server uptime, but the real discussion needs to be around the availability of the services our customers need. How do we measure uptime when we consider all of the servers and other hardware that is needed to deliver the service; especially when we consider things like DAG in Exchange, Mirroring in SQL, or Clustering and Virtualization in Windows Server? As we all know, we can take servers down in the middle of the day, if they are the right servers, without impacting the business.
Again, that’s theory, what’s the reality? Office 365 offers a three nines SLA, measured on a monthly basis. You can review the service descriptions here, the SLA’s provide the actual details of how uptime is measured and they are available here. This means that Office 365 will be available to your customers 43,156.8 minutes of the 43,200 minute month. Again, the easier to read version is that Office 365 will meet its SLA if it is unavailable no more than 43.2 minutes within a month.
While I like the table wikipedia provides, I want to make sure you understand how Office 365 measures the SLA. Office 365 (as of January 2012) uses the following formula:
I ran the numbers and the Office 365 formula produces the same results as the Wikipedia chart, but please be sure to leverage the formula from the SLA if there is any question about the Service Level Commitment around Office 365.
If you are already subscribed to Office 365, then you should already know about the Service Health page on your Admin portal. It provides the Current status of the Office 365 service. Here is a screen shot of the Current Status:
I mentioned earlier that I had found a few ways to measure the uptime of your Windows Servers. Below are three ways to identify either the uptime, or the last reboot time of your Windows Servers.
If you want tool that’s built into Windows Server, this tool will tell you when the server was last rebooted:
Here is a great a powershell script that queries the Event Log and computes the uptime of a server:
If you just want a simple tool, check out:
Until next time,