Creating a Repeated Event Detection *Rule*


 

One of the built in Monitor types in the SCOM Console, is a repeated event detection monitor.  This is a cool way of creating an alert when we want to know when multiple similar events are recorded in a specified time frame.  This is helpful for applications that might log one one or two events, and that might not be evidence of an actual issue, but when the log gets flooded, or logs a large number of events in a short time frame, that is evidence of a problem.

The issue I have always had with the SCOM console – is that it only provides us with a repeated event detection MONITOR, and not a rule.  The problem with using a monitor is that it assumes we want to drive health state around this.  Often times, we don’t, we just want simple alerting on the condition.  The other problem with a monitor is that we need a way to “reset” it back to healthy, and our options leave a lot to be desired.  We could use “Manual” reset… which is near worthless.  Manual reset monitors should almost NEVER be used in any case, they create labor intensive problems where customers have to use the console and reset a monitor, otherwise we will stop monitoring for a condition until the monitor is manually reset back to healthy.  Another alternative is Event reset, where we would use another, different event to trigger a “healthy” condition.  This would be ok, IF the application truly had an event that showed the previous condition had cleared up.  Most of the time, this is not the case.  Lastly, we could use a timer reset.  I end up using these often, simply because there is no other choice.  It is still a poor solution, because now the “health state” I am driving is completely meaningless, and I am only resetting it with a timer to clear it up so I can get additional alerts in the future.

This leaves us with the need to have a simple rule type, with repeated event detection.  It is actually quite simple to create, we just cannot create it using the SCOM UI.

For this example, I will show how to author this using the SCOM 2007 R2 Authoring Console, because that is still the simplest tool to use for this type of authoring.  http://www.microsoft.com/en-us/download/details.aspx?id=18222

Open the Authoring console and create a new empty MP:

image

Give the MP a display name and hit Create.

Choose the Health Model pane, and select Rules.   Choose New, Custom Rule:

image

Give the rule a proper ID:

image

On the General Tab – provide a DisplayName for the rule, and a good target class.  Never target Windows Computer – I like to use Windows Server Operating System as a good generic class:

image

On the Modules tab – this is where the magic happens baby!  Smile 

In a typical alert generating event rule, we have a datasource (the event log and expression) and a Write Action (the alert).  In this example – we will add a condition detection, that must be met before moving on to the write action.  The Condition Detection will be the repeat criteria.

In the Data Sources – select Create.  Choose the Microsoft.Windows.EventProvider, which is a simple composite datasource that combines the Microsoft.Windows.BaseEventProvider with a Condition Detection that provides an expression for the event criteria.  http://msdn.microsoft.com/en-us/library/ee809339.aspx

image

Provide a name for that Datasource (DS) and click OK.  Edit the Datasource we just created, then click Configure:

image

Here we will find the familiar UI for providing a log and event ID, source, etc.  This is the “Expression” I referenced above:

image

image

Hit OK on everything to get back to the Modules tab.

Click Create on the Condition Detection.  We want the System.ConsolidatorCondition   http://msdn.microsoft.com/en-us/library/ee809324.aspx

image

Provide a name for the Condition Detection (CD) and click OK.

Now Edit the CD we just created, and click the Configure button.  For this example, we can choose to trigger on count (sliding) which will allow us to alert anytime our even happens “x” times in any window of “y” seconds.  Set the compare count to 10, and the interval to 60 seconds for this example:

image

Click Ok twice to get back to the Modules page.

Now, create a write action.  We will use the System.Health.GenerateAlert

image

Provide a name for the Write Action (WA) and click OK.  Then Edit, and Configure the write action.

image

Save, import the MP, and test.  Voila:

image

 

Here is the MP XML:

<ManagementPack ContentReadable="true" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <Manifest> <Identity> <ID>example.eventrules</ID> <Version>1.0.0.0</Version> </Identity> <Name>example.eventrules</Name> <References> <Reference Alias="SC"> <ID>Microsoft.SystemCenter.Library</ID> <Version>6.1.7221.0</Version> <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> </Reference> <Reference Alias="Windows"> <ID>Microsoft.Windows.Library</ID> <Version>6.1.7221.0</Version> <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> </Reference> <Reference Alias="Health"> <ID>System.Health.Library</ID> <Version>6.1.7221.0</Version> <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> </Reference> <Reference Alias="System"> <ID>System.Library</ID> <Version>6.1.7221.0</Version> <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> </Reference> </References> </Manifest> <Monitoring> <Rules> <Rule ID="example.eventrules.repeatevent1000" Enabled="true" Target="Windows!Microsoft.Windows.Server.OperatingSystem" ConfirmDelivery="true" Remotable="true" Priority="Normal" DiscardLevel="100"> <Category>Custom</Category> <DataSources> <DataSource ID="DS" TypeID="Windows!Microsoft.Windows.EventProvider"> <ComputerName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</ComputerName> <LogName>Application</LogName> <Expression> <And> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="UnsignedInteger">EventDisplayNumber</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="UnsignedInteger">1000</Value> </ValueExpression> </SimpleExpression> </Expression> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="String">PublisherName</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="String">TEST</Value> </ValueExpression> </SimpleExpression> </Expression> </And> </Expression> </DataSource> </DataSources> <ConditionDetection ID="CD" TypeID="System!System.ConsolidatorCondition"> <Consolidator> <ConsolidationProperties /> <TimeControl> <WithinTimeSchedule> <Interval>60</Interval> </WithinTimeSchedule> </TimeControl> <CountingCondition> <Count>10</Count> <CountMode>OnNewItemTestOutputRestart_OnTimerSlideByOne</CountMode> </CountingCondition> </Consolidator> </ConditionDetection> <WriteActions> <WriteAction ID="WA" TypeID="Health!System.Health.GenerateAlert"> <Priority>1</Priority> <Severity>1</Severity> <AlertMessageId>$MPElement[Name="AlertMessageID0e8694e125494edab211685387e39a1b"]$</AlertMessageId> <AlertParameters> <AlertParameter1>$Data/Count$</AlertParameter1> <AlertParameter2>$Data/TimeWindowStart$</AlertParameter2> <AlertParameter3>$Data/TimeWindowEnd$</AlertParameter3> <AlertParameter4>$Data/Context/DataItem/EventDescription$</AlertParameter4> </AlertParameters> </WriteAction> </WriteActions> </Rule> </Rules> </Monitoring> <Presentation> <StringResources> <StringResource ID="AlertMessageID0e8694e125494edab211685387e39a1b" /> </StringResources> </Presentation> <LanguagePacks> <LanguagePack ID="ENU" IsDefault="true"> <DisplayStrings> <DisplayString ElementID="AlertMessageID0e8694e125494edab211685387e39a1b"> <Name>Event 1000 has ocurred multiple times</Name> <Description>The event 1000 has occurred {0} times between {1} and {2} Event Description: {3} </Description> </DisplayString> <DisplayString ElementID="example.eventrules"> <Name>Example EventRules</Name> </DisplayString> <DisplayString ElementID="example.eventrules.repeatevent1000"> <Name>Repeated Event 1000 Rule</Name> <Description /> </DisplayString> </DisplayStrings> </LanguagePack> </LanguagePacks> </ManagementPack>

As you can see by looking at the XML – this is just like any other typical alert generating event based rule – we simply add a condition detection for the consolidator module, and pass specific criteria to that module, like the time window interval, the count, and countmode. 


Comments (13)

  1. Kevin Holman says:

    MP author doesn’t have this capability, but they could add this as a feature. MP author adds the ability to had a condition detection that is a scheduler filter, which is different than the consolidator module. The scheduler filter adds the ability to
    only make the workflow active during certain time periods, like business hours, not during weekends, etc.

    1. Bruce Morey says:

      Kevin,
      I’m attempting to follow your steps, any reason you can think of why I can’t create a rule in Health Model Pane? I’m an admin and running in admin mode. Only action available is refresh.

      1. Kevin Holman says:

        did you create a new empty Management Pack first?

        1. Bruce Morey says:

          Sorry I’m taking so long to respond, but yes that was my problem. Here is a dumb question though: I did all this, saved and imported it. Now, won’t I need to create a new rule that references this new management pack?

          1. Bruce Morey says:

            Cancel that last response, I answered my own question. Thanks for this great article Kevin.

  2. I had similar issues that you have with the reset conditions of repeated event monitors. I made a monitor type that uses the missing event Condition Detection as the reset condition.

    http://blogs.technet.com/b/omx/archive/2013/01/21/repeated-event-monitors-with-a-missing-event-reset.aspx

  3. Mike Hanlon says:

    Another great post Kevin. I wonder if one could you create the same type of alert rule with the Silect MP Author tool? It has an option to schedule an event log rule by minutes.

  4. anitha says:

    Good one
    I need help in troubleshooting an rule which is already configured.

    A rule for rightfax servers for eventid 3314 was configured for Windows computer group. though it is overwridden it is affecting the entire MG and all the agents including RMS,MS are in warning.
    Due to this SDK service keeps stopping. and here is the error.

    I have already diabled this perticular Rule. which is affecting but no use
    help will be appriciated.

    The Windows Event Log Provider was unable to open the Application event log on computer "server name" for reading. The provider will retry opening the log every 30 seconds. Most recent error details: The RPC server is unavailable. One or more workflows were
    affected by this. Workflow name: MSExchangeMonitoringCorrelationConnectivityToRMS Instance name: Correlation Engine – sdwpcfs712a (Correlation Engine) – C-SDW Instance ID: {A9528C38-6A1E-9A5F-1B23-C8FE49941B59} Management group:

  5. Sean Tompkins says:

    Very nice – I always look for this under rules, then remember it’s only under monitors. It’s nice that the only addition is the condition detection – should make for easy XML editing if someone wanted to do it that way, or create a script to add the counter
    to an existing rule.

  6. Dudu Sakharovich says:

    I have a problem monitoring SQL job failures. The default monitor which comes with the SQL MP doesn’t count job failures before firing the alert.
    I need to create a rule/monitor that can generate alert after X sql job failures with the option to choose the X counter depending on the job.
    But I also need a way to clear the counter by using a different event ID than the SQL job failure event ID (Event 208 on application log).

    So basically i need to create a configurable monitor/rule for sql job failure which can be customized for X failures for each job and also be able to clear the X counter with another event ID.

    For example I have a job called "test" which is configured to generate alert after 3 consecutive job failures.
    But if the job failed twice and then succeeded and then failed one more time the alert shouldn’t be generated.

    I’ll be happy to receive some help with that.
    Thx in advance.

  7. evan says:

    I didn’t see any reason why this shouldn’t work for creating a rule for SCOM 2012 R2, so I went ahead and did that, updated the version numbers in the XML, and imported the MP. The MP is showing up in Administration –> Management Packs, but it is not
    in the MP list when selecting the Scope in Authoring –> Management Pack Objects –> Rules, and the rule is not showing up when I select all MPs and search for it.

    Is this normal, or should it be showing up?

    Thanks,

    -Evan

  8. Anonymous says:

    I had a customer looking for an example of how SCOM can monitor a server for multiple reboots in a period

  9. Scott says:

    I hope this is still being followed..

    I’m trying to use this methodology in 2012 R2 to create a rule that generates a simple alert if more than 5 event id 4625’s are detected in the security log in under a minute, and it doesn’t seem to work. Any ideas? I started with an empty management pack, and pointed at the security log rather than the application log, and selected event id 4625 instead of the below, and that’s the only real difference, but no alerts are generated when the condition is violated. Later I would want to scope the source of course to the same server.

Skip to main content