How to author an Alerting Event rule, correlating on a missing event


 

I had a customer request recently, where they wanted to generate an alert on the existence of a “Bad” event, but ONLY if it was NOT followed by a “Healthy” event after 5 minutes.

One of the scenarios for this was a Redundant Power Supply temporarily losing input power.  It was common for their power supplies to log events that one side had lost AC power, but then it would return within seconds.  They only wanted to be alerted if it was a sustained power loss.

We have a monitor example of this – in the UI – called the Correlated Missing Event Detection Monitor type.  The problem with this monitor, is that sometimes we don’t want to affect health state, or having a reliable reset mechanism can be troublesome.

I will show how to write a rule with these properties.

Most rules are simple – they contain a Datasource (Microsoft.Windows.EventProvider) and a WriteAction (GenerateAlert).  Simply match the expression for the event, and the write action fires.

This rule will be unique, because it will contain TWO datasources, and an additional component: a Condition Detection.

I’ll start with an example of the Datasource:   First = the Good, Healthy, or “clearing” event:

<DataSource ID="GoodEventDS" TypeID="Windows!Microsoft.Windows.EventProvider"> <ComputerName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</ComputerName> <LogName>Application</LogName> <Expression> <And> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="String">PublisherName</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="String">EventCreate</Value> </ValueExpression> </SimpleExpression> </Expression> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="UnsignedInteger">EventDisplayNumber</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="UnsignedInteger">102</Value> </ValueExpression> </SimpleExpression> </Expression> </And> </Expression> </DataSource>

Next the Bad, Unhealthy, or “trigger” event:

<DataSource ID="BadEventDS" TypeID="Windows!Microsoft.Windows.EventProvider"> <ComputerName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</ComputerName> <LogName>Application</LogName> <Expression> <And> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="String">PublisherName</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="String">EventCreate</Value> </ValueExpression> </SimpleExpression> </Expression> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="UnsignedInteger">EventDisplayNumber</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="UnsignedInteger">101</Value> </ValueExpression> </SimpleExpression> </Expression> </And> </Expression> </DataSource>

Next up – the condition detection.  We actually have some fancy developed condition detections such as System.CorrelatorAutoMissingCondition which is defined at https://msdn.microsoft.com/en-us/library/ff521631.aspx   However, I could never get these to work with a rule.  It is odd – because it works great with a monitor.  Instead, I chose to peel back the onion and just use the System.Correlator Module – defined at https://msdn.microsoft.com/en-us/library/ff458713.aspx.  And with this module – I will just write my own expression for the missing event.

Here is the XML:

<ConditionDetection ID="Correlator" TypeID="System!System.CorrelatorCondition"> <Correlator> <CorrelationExpression /> <Count>1</Count> <Interval>30</Interval> <CorrelationOrder>InSequence</CorrelationOrder> <CorrelationItemPolicy>ResetWindow</CorrelationItemPolicy> </Correlator> <Expression> <Or> <Expression> <And> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="UnsignedInteger">Item0Count</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="UnsignedInteger">1</Value> </ValueExpression> </SimpleExpression> </Expression> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="UnsignedInteger">Item1Count</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="UnsignedInteger">0</Value> </ValueExpression> </SimpleExpression> </Expression> </And> </Expression> <Expression> <And> <Expression> <SimpleExpression> <ValueExpression> <Value Type="String">$Config/Correlator/CorrelationOrder$</Value> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="String">AnyOrder</Value> </ValueExpression> </SimpleExpression> </Expression> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="UnsignedInteger">Item0Count</XPathQuery> </ValueExpression> <Operator>GreaterEqual</Operator> <ValueExpression> <Value Type="UnsignedInteger">1</Value> </ValueExpression> </SimpleExpression> </Expression> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="UnsignedInteger">Item1Count</XPathQuery> </ValueExpression> <Operator>Less</Operator> <ValueExpression> <Value Type="UnsignedInteger">$Config/Correlator/Count$</Value> </ValueExpression> </SimpleExpression> </Expression> </And> </Expression> </Or> </Expression> </ConditionDetection>

It is rather long – but most of that is the complicated expression.

The correlator part is quite simple:

  <ConditionDetection ID="Correlator" TypeID="System!System.CorrelatorCondition">
    <Correlator>
      <CorrelationExpression />
      <Count>1</Count>
      <Interval>30</Interval>
      <CorrelationOrder>InSequence</CorrelationOrder>
      <CorrelationItemPolicy>ResetWindow</CorrelationItemPolicy>
    </Correlator>

Count is the number of “good” events required to not generate an alert.

Interval is the time window to allow the “good” events to show up after a bad event is observed.

CorrelationOrder specifies whether or not the items are to be correlated in a set sequence or are to be evaluated regardless of order.

CorrelationItemPolicy specifies how the module handles multiple incoming primary data items within a single time interval.

These above are all defined here:  https://msdn.microsoft.com/en-us/library/ff458712.aspx

 

The expression part is likely the most difficult.  The ordering of Item0Count and Item1Count was perplexing.  What I found in a rule, is that the first Datasource (event) becomes Item1Count, while the second Datasource (event) becomes Item0Count.  So be aware – order matters here.

Therefore – my expression states to “match” (generate alert), when Item0Count (bad event) = 1 and Item1Count (good event) = 0 (or missing) in the time frame.   OR – when Item0Count is greater than 1 in the time period, while Item1Count is LessThan the configured “Count” value I talked about above.

So remember:  Our rule will have these components in order:

<DS Healthy Event>
<DS Bad Event>
<Correlator Condition Detection>
<Write Action to Generate Alert>

I’ll attach my complete management pack below.  This sample is designed to fire an alert when a Bad event ID 101 is observed, but a Good event 102 is not fired within 30 seconds of the bad event.

 

***Note – you may notice a slight delay of longer than 30 seconds for the alert to fire.  This is because the correlator condition detection has two optional properties – Latency and DrainWait which add a small amount of time before alerting.

Demo.Alert.Correlated.Missing.Event.xml.zip


Comments (3)

  1. Jesty says:

    Good article Kevin!! A new learning for us.

  2. Mark Ringo says:

    Very very helpful post. Thanks for the information.

    Can we define "healthy" and "bad" based on the contents of the text included in the log event?

  3. Iwan says:

    Hi Kevin,

    I want to monitor an back-up application on missing events.

    I create alert rules for these events: (this works fine)

    Event ID 5000 – Successful Backup event
    Event ID 5002 – Failed Backup event
    Event ID 5003 – Successful Restore event
    Event ID 5004 – Failed Restore event
    Event ID 5005 – Successful Offsite Copy event
    Event ID 5007 – Failed Offsite Copy event

    When the back-up timed –out there is no event.
    I want to get an alert when there is no Event.

    Al the events are created on one HyperV server.

    Like this:

    Guest VM Name: SVR-FILE01 Backup Result: Successful Backup – Backed 3.91 GB (compressed to 1.47 GB). (Duration: 4h 45m) Backup operation started at: Yesterday at 20:58

    I try to create a missing event monitor. But there are more events with the same Id in the back-up window.
    Only when I configure one server it works fine.

    Here an example from the monitor with multiple servers in it.

    ( ( ( Event ID Equals 5000 ) AND ( EventDescription Contains Guest VM Name: SVR-APP06 ) ) AND ( ( Event ID Equals 5000 ) AND ( EventDescription Contains Guest VM Name: SVR-APP07 ) ) AND ( Event ID Equals 5000 ) AND ( EventDescription Contains Guest VM Name:
    SVR-APP03 ) )

    Do you have an solution how scom can create an alert from missing event in our backup window?

    Greetings

    Iwan

Skip to main content