SCOM alert management scenarios

The scenario table was completely updated (bug fixes and clarification) on 10/24/2016.

Almost a year ago I wrote a short blog post about SCOM alert handling/management and how I do it .
After writing the initial post back in 2015 my valued colleague Nathan Gau has written an excellent 3 part post about several process related aspects of SCOM alert management. If you do not know these articles: it is definitively a must read!

This week I had an interesting discussion with a customer about SCOM alert management/handling scenarios based on Nathan’s and my posts. During this discussion I found some inconsistencies and some errors in my alert handling scenarios described. Therefore, I have updated this document and will put this up for discussion and as a reference.

When we talk about SCOM alert management and handling, we have to talk about good/recommended and bad practices, how alerts get closed and what happens with the alert source in case of monitor alerts. The following table tries to describe these topics for most (all ?) possible alert scenarios:

SCOM-AlertScenariosv2

 

You can download this table as a PDF here.

To sum this table up:

  • Do not close alerts manually if you are not fully aware of the consequences or side effects (e.g. monitors still being in an unhealthy state)!
  • Remember that the stored procedure behind the built-in alert auto closure in SCOM uses the LastModified property of an alert as the timestamp and that it uses ResolutionState <> 255 as a filter regardless of what the SCOM console tells you! See also Kevins excellent post on this topic.
  • Create your own automatic solution, whether it is a SCOM workflow, an Orchestrator runbook or whatsoever which enforces the alert closure policy of your organization and which helps to mitigate bad manual behavior (e.g. by checking each closed monitor alert if the causing monitor is still unhealthy)!
    I will talk about my way of automating SCOM alert closure and health state reset in a later post.

I hope that might help some of you to better understand the alert closure and health state reset process used in SCOM.