Troubleshooting alert storms in OpsMgr 2007

image A large and sudden increase in the number of alerts is called an alert storm. An alert storm can be a symptom of massive changes of some kind within your management group, such as the catastrophic failure of networks. An alert storm can also be a symptom of configuration issues within Microsoft System Center Operations Manager 2007.

Installing new or updated management packs can give rise to an alert storm. Monitors in a management pack begin working as soon as the management pack has been imported. Use best practices in importing management packs to minimize alert storms.

Finding Alert Storms

For general, real-time monitoring of alerts, use the Active Alerts view. Make sure Scope is not active and hiding alerts.

Check for large numbers of alerts when your network undergoes changes. Monitor closely when you install a new management pack.

Operations Manager 2007 offers reports that can be useful in identifying alert storms. From an Operations console with access to a reporting server, look at the Microsoft Generic Report Library. The reports Most Common Alerts and Most Common Events help identify high-volume alerts.

Modifying Monitors and Rules

If you are getting a large number of alerts that do not point to issues in your managed systems, you need to modify the monitors or rules that create those alerts.

View active alert details in the Monitoring pane. Alert Details specifies the monitor or rule for an alert.

To override a monitor

1. Log on to the computer with an account that is a member of the Operations Manager Advanced Operator role for the Operations Manager 2007 management group.

2. In the Operations console, click the Authoring button.

3. In the Authoring pane, expand Management Pack Objects and then click Monitors.

4. In the Monitors pane, expand an object type completely and then click a monitor.

5. On the Operations console toolbar, click Overrides and then point to Override the Monitor. You can choose to override this monitor for objects of a specific type or for all objects within a group. After you choose which group of object type to override, the Override Properties dialog box opens, enabling you to view the default settings contained in this monitor. You can then choose whether to override each individual setting contained in the monitor.

Note: If the Overrides button is not available, make sure you have selected a monitor and not a container object in the Monitors pane.

6. Click to place a check mark in the Override column next to each setting that you want to override.

7. Either select a management pack from the Select destination management pack list or create a new unsealed management pack by clicking New.

Note: By default, when you create a management pack object, disable a rule or monitor, or create an override, Operations Manager saves the setting to the Default Management Pack. As a best practice, you should create a separate management pack for each sealed management pack you want to customize, rather than saving your customized settings to the Default Management Pack. For more information, see Default Management Pack .

8. When you complete your changes, click OK.

Note: The procedure for overriding rules is the same as for monitors. See how your overrides affect the amount of alerts and continue to fine-tune the monitors as necessary.

For more details see https://technet.microsoft.com/en-us/library/bb309455.aspx

About Suppressed Alerts

Rules offer the option of suppressing duplicate alerts. A suppressed alert is not displayed in the Operations console.

Operations Manager 2007 suppresses only duplicate alerts as defined by the alert suppression criteria. Fields stated in the suppression criteria must be identical for the alert to be considered a duplicate and suppressed. An alert must be created by the same rule and be unresolved to be considered a duplicate.

Alert Suppression Policy

Certain rule subtypes can raise an alert as a response to a successful criteria match. By default, an alert is created for each instance of the criteria match. This might not be useful to operations personnel if each new alert instance is displayed in the Operator console as a new issue. OpsMgr allows you to configure the rule so that duplicate instances of the alert are suppressed, or hidden, within an existing, but unresolved, alert. When the representative alert is resolved, so are all suppressed alerts within it.

Alert suppression is rule-based and a rule can only suppress alerts that it generated. If two different alert-generating rules matched the same event, then two unique alerts would appear in the MOM Operator console.

Important

The operations personnel using your Management Pack might want to configure your rules to suppress additional parameters in the events. The event collection rules might not collect these parameters by default, and therefore suppressing on these parameters would not be possible. In building your event collection rules, make sure that you configure them to collect these additional parameters. Security Log events are a good example of events that end users will want to suppress.

Alert suppression policy is highly-configurable but default settings (Computer and Domain checked) satisfy most Management Pack needs. Custom alert suppression policy might be used in the following cases:

  • A rule executes a script that runs a number of health checks on an application. As a result, the script might generate two or more different alerts directly from the script code. The author suppresses on "Alert Name".

  • A rule is configured to generate an alert, based on an event that might expose different health states and problems. In this case, the difference in the heath states can be distinguished by the text in the description. The author can suppress by "Description" such that unique alerts are generated when the descriptions vary.

Note: Suppressing on description can be problematic if the description consists of non-static text. For example, suppressing on a description that contains dates and times would not work as expected.

J.C. Hornbeck | System Center Knowledge Engineer