Hey, Scripting Guy! How Do I Calculate Server Uptime?

ScriptingGuy1

Hey, Scripting Guy! Question

Hey, Scripting Guy! I loved your idea about tracking what time my server came up and subtracting it from the current time. I bow at your feet! But guess what? My pointy-headed boss says that does not mean jack squat (or words to that effect). He said to really prove server uptime, I need to track the amount of time the server was down during the entire month, subtract if from the available time in the month, and then calculate the percentage. Help me, Scripting Guy, you are my only hope.

– HB

SpacerHey, Scripting Guy! Answer

Hi HB,

Don’t sweat it, HB. I have got you covered.

We can use the Windows System Event log to obtain the information to satisfy your pointy-headed boss. We simply need to look for startup entries and shutdown entries, obtain the time stamps from each event, subtract the amount of time between the two events et voila we have your uptime. The GetUpTimeFromSystemLog.ps1script is seen here.

timespan]$downTime = New-TimeSpan -start 0 -end 0
[timespan]$totalDownTime = New-TimeSpan -start 0 -end 0
Get-EventLog -LogName system | 
Where-Object `
{ $_.eventid -eq 6005 -OR $_.eventID -eq 6006 -AND $_.timegenerated -gt (get-date).adddays(-30) } | 
Sort-Object -Property timegenerated |
Foreach-Object `
{
  if ($_.EventID -eq 6006)
     { 
       $down = $_.TimeGenerated
     } #end if eventID
  Else
     { 
      $up = $_.TimeGenerated 
     } #end else
   if($down -AND $up)
     {
      if($down -ge $up) 
         { 
           Write-Host -foregroundColor Red "*** Invalid data. Ignoring $($up)"
           $up = $down 
          } #end if down is greater than up
       [timespan]$CurrentDownTime = new-TimeSpan -start $down -end $up
       [timeSpan]$TotalDownTime = $currentDownTime + $TotalDownTime
       $down = $up = $null
     } #end if down and up
} #end foreach

"Total down time on $env:computername is:"
$TotaldownTime

$minutesInMonth = (24*60)*30
$minutesInDay = 24*60
$downTimeMinutes = $TotaldownTime.TotalMinutes
$percentUpTime = "{0:n2}" -f (100 - ($downTimeMinutes/$minutesInMonth)*100)

"This computes to $percentUptime percent uptime on $env:computerName for the current 30 day period"

First a little background: Every time your computer starts up, it will need to start the EventLog service. When this happens an event gets written to the Windows System Event log. The event source is listed as EventLog because that is the name of the service that is supplying the event. The number we are interested in for startup tracking is EventID 6005. If we had any doubts that this is the correct event, we can read the message in the big gray box that says, “The Event log service was started.”

When your server is shut down, the EventLog service also writes an entry into the system log. This one will have Event ID 6006. The message associated with this event says the EventLog service was stopped.

Starting the event log is one of the first things that Windows does; on the other hand, stopping the event log is one of the last things that Windows does. This approach will allow us to obtain a reasonably accurate picture of uptime for the entire month. We simply need to go through the system event log and pick out each of the 6006 and 6005 Event IDs that occurred during the period of time in question.

We could use WMI to do this. If we did, we could write the script in either VBScript or in Windows PowerShell. For example, the Windows PowerShell version would begin using this code:

Get-WmiObject -Class win32_NTLogEvent -Filter `
"LogFile = 'system' AND eventCode = 6006" | 
Format-Table -property timeWritten, timeGenerated, recordNumber, eventCode

There are a couple of problems with this approach, not the least of which is that the timewritten and the timeGenerated properties are returned from WMI in UTC format. (For a more complete discussion of UTC formatted time, see How Long Has My Server Been Up?) An easier way to work with the Windows Event Logs from within Windows PowerShell is to use the Get-EventLog cmdlet. The initial query would look like this:

Get-EventLog -LogName system | 
Where-Object `
{ $_.eventid -eq 6005 -OR $_.eventID -eq 6006 -AND $_.timegenerated -gt (get-date).adddays(-30) } | 
Sort-Object -Property timegenerated

That is a single command. It begins innocently enough by saying in effect “get everything from the system log.” The complicated part is the where clause. For the where clause, we use the Where-Object, cmdlet which always begins with a set of brackets (braces, squiggly brackets, tuborgs, whatever you want to call them). Next, we use the $_ automatic variable to refer to an individual event log entry as it comes across the pipeline. We examine the eventID property and see if it is equal to either 6005 or 6006. If this is groovy, we move on and look at the timeGenerated property. We want to see events from the last 30 days. We do this by using the Get-Datecmdlet to create a date time object, and add -30 days to it. When we have found the last 30 days of event entries, we sort them on thetimeGenerated property by using the Sort-Object cmdlet. This gives us a collection of events.

Now we need to troll through the collection of events and figure out the amount of downtime. To do this we will use the ForEach-Objectcmdlet. If the eventID property is equal to 6006, we assign the timeGenerated value to the $down variable. However, if the eventIDproperty is equal to 6005, we assign the value of the timeGenerated property to the $up variable. This is seen here:

Foreach-Object `
{
  if ($_.EventID -eq 6006)
     { 
       $down = $_.TimeGenerated
     } #end if eventID
  Else
     { 
      $up = $_.TimeGenerated 
     } #end else

So far this has been relatively easy. As soon as we have two values–one for downtime and one for uptime–we need to make sure the date values are valid. If the server never crashes and if the server does not come up and down very often, there would be no issues. However, it is possible that the server could crash, in which case there would be two 6005 events and no 6006 event. To handle this situation, we cheat just a little and set the value of $down to the value of $up. This will keep us from generating a negative number, but it will skew the downtime a bit in our favor. You will need to decide if this approach is okay or not for you. Here is that bit of code:

if($down -ge $up) 
         { 
           Write-Host -foregroundColor Red "*** Invalid data. Ignoring $($up)"
           $up = $down 
          } #end if down is greater than up

The next thing we need to do is to subtract the two date time values from each other. To do this, we will need to use time spanobjects. These are easy to get because we can use the New-TimeSpan cmdlet to obtain them. But why do we need a time span object in the first place? So we can add and subtract the time value that’s why. To ensure our values are actually timespan values, we use the [timespan] type constraint. Lastly we get rid of the values we have for $down and $up by assigning their value to $null. When we re-enter the foreach loop, we will pick up the next values in our collection and assign them to the appropriate variables. This code is shown here:

[timespan]$CurrentDownTime = new-TimeSpan -start $down -end $up
       [timeSpan]$TotalDownTime = $currentDownTime + $TotalDownTime
       $down = $up = $null

The last thing we need to do is generate our report. We pick up the computer name from the environment drive, and we calculate the percentage of uptime based upon the number of minutes in a month. To clean up the numeric display a bit, we use the “{0:n2}” -fsyntax, which is .NET Framework numeric speak for “show me only two trailing digits.” This code is shown here:

"Total down time on $env:computername is:"
$TotaldownTime

$minutesInMonth = (24*60)*30
$minutesInDay = 24*60
$downTimeMinutes = $TotaldownTime.TotalMinutes
$percentUpTime = "{0:n2}" -f (100 - ($downTimeMinutes/$minutesInMonth)*100)

"This computes to $percentUptime percent uptime on $env:computerName for the current 30 day period"

In the end, HB, the script is a bit complicated; believe it or not, there is still some more work to do on it. For instance, right now the script runs only on a local computer. Also, if the server never went down during the month, we are not displaying the value of 100 percent uptime for the month. Lastly, we are not storing the uptime data anywhere either. We will do this in the upcoming TechNet Magazine article. In the meantime, tell your boss to take off that hat.

Ed Wilson and Craig Liebendorfer, Scripting Guys

0 comments

Discussion is closed.

Feedback usabilla icon