Event Log rule to look for multiple reboots – a script WriteAction example


 

I had a customer looking for an example of how SCOM can monitor a server for multiple reboots in a period of time.

I previously wrote about the typical scenario of looking for repeated events in a defined time period here:  http://blogs.technet.com/b/kevinholman/archive/2014/12/18/creating-a-repeated-event-detection-rule.aspx

However – this wont work across reboots.  The consolidator Condition Detection that keeps a count of multiple events across time is handled in memory, on the agent.  If the agent service or server is restarted – we lose the count because the workflow must reinitialize.

One way to handle this is via a script write action.  Essentially – a reboot is typically detected via a 6009 event in the SYSTEM log.  (Dirty shutdowns can be detected via 6008 event and you should already be monitoring for these)   However – in this example we don’t want an alert on every normal reboot.  We only want to know if a server is rebooted multiple times in a specific time period.

We can accomplish this via two rules.

One rule will use an Event datasource, but instead of alerting – we will execute a script WriteAction as the response to the event.  The script is a simple VBscript that looks in the system log for a specific duration of time, and counts the number of matching events.

Here is the rule:

<Rule ID="Custom.Example.EventLogCheck.Event6009.Rule" Enabled="true" Target="Windows!Microsoft.Windows.Server.OperatingSystem" ConfirmDelivery="true" Remotable="true" Priority="Normal" DiscardLevel="100"> <Category>Custom</Category> <DataSources> <DataSource ID="DS" TypeID="Windows!Microsoft.Windows.EventProvider"> <ComputerName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</ComputerName> <LogName>System</LogName> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="UnsignedInteger">EventDisplayNumber</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="UnsignedInteger">6009</Value> </ValueExpression> </SimpleExpression> </Expression> </DataSource> </DataSources> <WriteActions> <WriteAction ID="ScriptWriteAction" TypeID="Custom.Example.EventLogCheck.WA" /> </WriteActions> </Rule>

 

The script is very simple:  You can reuse this just change the event ID, count, and time you want at the top.  You might also need to customize the events created by LogScriptEvent to suit your needs and provide a good message for the alert. 

My log for a detection of 3 events looks like:

Call oAPI.LogScriptEvent("CheckEventLog.vbs",1001,1,": CRITICAL : Event " & EventId & " has been detected " & Count & " or more times in the past " & Minutes & " minutes")

This will log a critical event with ID 1001 in the OpsMgr event log on the agent, with the event description resembling this:

image

 

Here is the script:

 

'========================================================================== ' ' NAME: CheckEventLog.vbs ' ' COMMENT: This is a write action script to inspect the event log for previous events ' ' Change the values for EventId, Count, and Minutes for your write action example (minutes is expressed as a negative number offset) ' '========================================================================== Option Explicit SetLocale("en-us") Dim EventId, Count, Minutes EventId = 6009 Count = 3 Minutes = -20 Dim oAPI Set oAPI = CreateObject("MOM.ScriptAPI") Dim strComputer 'The script will always be run on the machine that generated the original event strComputer = "." Dim strTime strTime = Time 'Check to see if this event has been logged x occurrences in n minutes Dim dtmStartDate, iCount, colEvents, objWMIService, objEvent Const CONVERT_TO_LOCAL_TIME = True Set dtmStartDate = CreateObject("WbemScripting.SWbemDateTime") dtmStartDate.SetVarDate dateadd("n", Minutes, now)' CONVERT_TO_LOCAL_TIME iCount = 0 Set objWMIService = GetObject("winmgmts:" _ & "{impersonationLevel=impersonate,(Security)}!\\" _ & strComputer & "\root\cimv2") Set colEvents = objWMIService.ExecQuery _ ("Select * from Win32_NTLogEvent Where Logfile = 'SYSTEM' and " _ & "TimeWritten > '" & dtmStartDate & "' and EventCode = " & EventId & "") For Each objEvent In colEvents iCount = iCount+1 Next If iCount => Count Then Call oAPI.LogScriptEvent("CheckEventLog.vbs",1001,1,": CRITICAL : Event " & EventId & " has been detected " & Count & " or more times in the past " & Minutes & " minutes") WScript.Quit End If Call oAPI.LogScriptEvent("CheckEventLog.vbs",1002,0,": INFO : Event " & EventId & " was detected, but has not been detected " & Count & " or more times in the past " & Minutes & " minutes") Wscript.Quit

 

We just need to wrap this up into a write action:

 

 

<WriteActionModuleType ID="Custom.Example.EventLogCheck.WA" Accessibility="Public" Batching="false"> <Configuration /> <ModuleImplementation Isolation="Any"> <Composite> <MemberModules> <WriteAction ID="ScriptWrite" TypeID="Windows!Microsoft.Windows.ScriptWriteAction"> <ScriptName>CheckEventLog.vbs</ScriptName> <Arguments /> <ScriptBody><![CDATA[ '========================================================================== ' ' NAME: CheckEventLog.vbs ' ' COMMENT: This is a write action script to inspect the event log for previous events ' ' Change the values for EventId, Count, and Minutes for your write action example (minutes is expressed as a negative number offset) ' '========================================================================== Option Explicit SetLocale("en-us") Dim EventId, Count, Minutes EventId = 6009 Count = 3 Minutes = -20 Dim oAPI Set oAPI = CreateObject("MOM.ScriptAPI") Dim strComputer 'The script will always be run on the machine that generated the original event strComputer = "." Dim strTime strTime = Time 'Check to see if this event has been logged x occurrences in n minutes Dim dtmStartDate, iCount, colEvents, objWMIService, objEvent Const CONVERT_TO_LOCAL_TIME = True Set dtmStartDate = CreateObject("WbemScripting.SWbemDateTime") dtmStartDate.SetVarDate dateadd("n", Minutes, now)' CONVERT_TO_LOCAL_TIME iCount = 0 Set objWMIService = GetObject("winmgmts:" _ & "{impersonationLevel=impersonate,(Security)}!\\" _ & strComputer & "\root\cimv2") Set colEvents = objWMIService.ExecQuery _ ("Select * from Win32_NTLogEvent Where Logfile = 'SYSTEM' and " _ & "TimeWritten > '" & dtmStartDate & "' and EventCode = " & EventId & "") For Each objEvent In colEvents iCount = iCount+1 Next If iCount => Count Then Call oAPI.LogScriptEvent("CheckEventLog.vbs",1001,1,": CRITICAL : Event " & EventId & " has been detected " & Count & " or more times in the past " & Minutes & " minutes") WScript.Quit End If Call oAPI.LogScriptEvent("CheckEventLog.vbs",1002,0,": INFO : Event " & EventId & " was detected, but has not been detected " & Count & " or more times in the past " & Minutes & " minutes") Wscript.Quit ]]></ScriptBody> <TimeoutSeconds>60</TimeoutSeconds> </WriteAction> </MemberModules> <Composition> <Node ID="ScriptWrite" /> </Composition> </Composite> </ModuleImplementation> <InputType>System!System.BaseData</InputType> </WriteActionModuleType>

Lastly – we create a simple Alert Generating rule – to look in the Operations Manager event log – to alert on the “1001” event ID with source “Health Service Script” and EventDescription contains “CRITICAL”

 

<Rule ID="Custom.Example.EventLogCheck.MultipleReboots.Rule" Enabled="true" Target="Windows!Microsoft.Windows.Server.OperatingSystem" ConfirmDelivery="true" Remotable="true" Priority="Normal" DiscardLevel="100"> <Category>Alert</Category> <DataSources> <DataSource ID="DS" TypeID="Windows!Microsoft.Windows.EventProvider"> <ComputerName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</ComputerName> <LogName>Operations Manager</LogName> <Expression> <And> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="UnsignedInteger">EventDisplayNumber</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="UnsignedInteger">1001</Value> </ValueExpression> </SimpleExpression> </Expression> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="String">PublisherName</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="String">Health Service Script</Value> </ValueExpression> </SimpleExpression> </Expression> <Expression> <RegExExpression> <ValueExpression> <XPathQuery Type="String">EventDescription</XPathQuery> </ValueExpression> <Operator>ContainsSubstring</Operator> <Pattern>CRITICAL</Pattern> </RegExExpression> </Expression> </And> </Expression> </DataSource> </DataSources> <WriteActions> <WriteAction ID="Alert" TypeID="Health!System.Health.GenerateAlert"> <Priority>1</Priority> <Severity>1</Severity> <AlertName /> <AlertDescription /> <AlertOwner /> <AlertMessageId>$MPElement[Name="Custom.Example.EventLogCheck.MultipleReboots.Rule.AlertMessage"]$</AlertMessageId> <AlertParameters> <AlertParameter1>$Data/EventDescription$</AlertParameter1> </AlertParameters> <Suppression /> <Custom1 /> <Custom2 /> <Custom3 /> <Custom4 /> <Custom5 /> <Custom6 /> <Custom7 /> <Custom8 /> <Custom9 /> <Custom10 /> </WriteAction> </WriteActions> </Rule>

After 3 reboots in 20 minutes – we get this:”"

image

 

I will attach my example management pack below:

Custom.Example.EventLogCheck.xml.zip


Comments (3)

  1. AWESOMESAUCE KEVIN! …. something that just came through the pipes for me to work out for a customer as well. thank you for the hard part. 🙂

  2. magesh says:

    Hello Kevin, Thanks for the awesome tutorial and it works good. But I need to change the count and time how can I edit the scirpt. Sorry, coudn’t find the script on the Rule.

  3. Kevin Holman says:

    Just edit it using XML.

Skip to main content