Prioritizing OpsMgr Alerts with a Distributed Applications Availability Monitor

Getting overwhelmed with alerts in your OpsMgr environment? Not sure which alert to tackle first?

Ideally, we’d like our OpsMgr environment to be fine-tuned so that the environment…

  1. only generates alerts for something that will be actionable; and
  2. only collects data that has value to our organizations.

That sounds nice and is relatively easy to get to when you are starting an implementation of OpsMgr, but what if you’ve inherited an environment where fine tuning didn’t really go as planned?

Well, one option to regain control of your environment, without having to rebuild everything or take out every management pack, is to use Distributed Applications to surface what’s critical to your organization.

What is DA?

Distributed Applications, DA, (not to be confused with Direct Access–-an awesome piece of technology in its own right) is a feature in OpsMgr that allows us to combine and group any discovered objects into a logical arrangement and roll up the health of the group into one availability monitor thus generating a single alert. This last part is key and often overlooked when building a DA.

A Scenario – Database Tiers

Let’s say we’re looking after SQL database health and availability in our organization using OpsMgr. In a large organization, OpsMgr may discover and monitor hundreds or even thousands of SQL DBs, each one that can go offline, run out of space, fail to backup, experience disk IO contention, etc. How can we prioritize these alerts?

In the case of databases, some good tricks exist out there to reduce non-actionable alerts (like disabling the discovery of certain DBs altogether), but maybe we still have a large number of DBs where things can go wrong.

The SQL DB view is a good place to start.

image

We can then reduce this administrative task further and embed some simple logic into our monitoring.

Creating our Simple DA – All Tier 1 DBs

  • Start the OpsMgr Administrator Console.
  • Under Authoring, right-click on Distributed Applications and Create a new distributed application…

image

 

  • When the Distributed Application Designer wizard comes up, fill in the information. Remember to be accurate and descriptive to facilitate reuse and troubleshooting.

image

  • Chose the Blank(Advanced) template so that we can build the DA as we want. Click OK.
  • On the next screen, we look for all the DBs that have been discovered.
  • Click on Advanced Search

image,

  • In the Advanced Search window, under Search for: select Database and click Search (leaving the search field blank will return any value) . All the discovered objects of type Database are returned.
  • Select the databases that you want to include in your DA and click Add.

image

 

  • On the Advanced Search Result pane, right click on one of the databases and select add to New Component Group (since we haven’t created one yet).
  • Name your component and click OK:

image

  • Select the rest of the databases from the search result and right click; then add to the component group we just created.
  • Save the distributed application and then Configure the Health Rollup – Availability at the bottom:

image

  • After clicking on Configure Health Rollup – Availability, you can change the various parameters such as alert severity, priority, etc.
  • The parameter we are interested in here is Generates Alert, which is disabled by default. Change the Override Value to True and click Ok.

image

Now let’s test to see what happens. When we put one of our databases “test_cloned” offline, an alert is generated:

image

So how are we better off than before?

The idea is to help us prioritize our remediation work while we develop a longer term strategy and tune our environment. So here’s what this DA alert gives us:

  1. One single alert for any of the tier 1 DBs instead of an alert for each single one. From a notification subscription perspective, we get one alert and get on it. All DBs in that DA are critical.
  2. Similarly, for our tier 2 DBs, we could then create a tier 2 database DA (i.e., Contoso Tier 2 DBs). If both alerts are raised within the same timeframe, we know which one to devote resources to first – and that’s the whole point!

I’d like to stress once more that this tactic shouldn’t be used as a free pass to ignore the other alerts. It’s just a way to have the critical alerts filter out to the top. If you can only work on a limited number of alerts in one day, this technique allows you to address the key items first.

What do you think? Would a DA allow us to better prioritize our alerts?

For more information on Distributed Applications, see: https://technet.microsoft.com/en-us/library/hh457612.aspx