Adjusting Alert Storm thresholds on SCOM Agents


 

One of the features that has existed since SCOM 2007R2, is the ability to adjust Alert Storm thresholds on an agent by agent basis.  The default is any agent that generates 50 alerts, in a 60 second window – will auto-disable that workflow for 10 minutes to control alert storms.

All of this activity occurs on the agent itself.  There is an alert generated to let you know you are having an alert storm – but the alert is in response to an event in the SCOM agent’s event log only.

Log Name:      Operations Manager
Event ID:      5399
Computer:      SERVER.opsmgr.net
Description:
A rule has generated 3 alerts in the last 5 seconds.  Usually, when a rule generates this many alerts, it is because the rule definition is misconfigured.  Please examine the rule for errors. In order to avoid excessive load, this rule will be temporarily suspended until 2018-10-30T15:23:42.2572524-05:00.
Rule: SCOM.Management.TestEvent100.Rule

image

 

This is configurable on a per-agent basis, if you wish – via the registry:

HKLM\SYSTEM\CurrentControlSet\Services\HealthService\Parameters\Management Groups\<MGNAME>\

Create three REG_DWORD values:

Alert Count - number of alerts from a single workflow to trigger an event about the alert storm

Alert Count Interval – the time period in SECONDS in which the number of alerts will be observed

Alert Suspend Interval – the number of SECONDS you want the workflow temporarily disabled

 

image

 

You will need to restart the Microsoft Monitoring Agent (Healthservice) on the agent, in order for these changes to take effect.  You could consider even changing these on a large scale using a SCOM workflow, task, script, or via GPO.


Comments (0)

Skip to main content