Decentralizing Monitoring Configuration

SCOM provides rich override capabilities from the console so authorized users can override configuration such as thresholds, enable / disable or anything marked as overrideable. This allows customers to tune management packs as appropriate before or after deployment. There are times when people want to give some control of this away to the application or server owner and allow them to tune some settings at the monitored server. Care should be taken with this approach and I want to highlight some points:

  • By allowing monitoring to be changed at the source you potentially remove a trail of what went on. If an application owners is screaming at Ops for missing an alert there is little trace of the fact that this was set by the same owner or the server administrator at the source
  • Part of the benefit of SCOM in the enterprise is that it can centralize the administration of monitoring. By following this you are potentially removing one of those benefits.
  • Behavior in SCOM can be very different between monitored servers. It can be hard to rationalize what monitoring is in effect as overrides affect some and not all monitored servers
  • Using configuration at the source can lead to authors writing discoveries that are overly aggressive so that changes are picked up quickly. This can lead to frequent instance space churn if properties are changing a lot.

There are times when this is what the customer wants so I wanted to illustrate one possible approach for this leveraging class properties and discovery from the registry. There are some mitigations you can follow for the above as you go through this:

  • Ensure only people authorized to make changes can actually make the changes. Since you are using the registry in this example you need to be able to edit the registry which usually calls for administrative rights
  • Consider a discovery mechanism such as an event to trigger a new discovery rather than using an aggressive polling frequency
  • Limit the usage of this to key scenarios not all of your override cases

This is not something you can readily apply to sealed management packs. It is really designed for home grown MPs for your custom monitoring needs as it involves defining new classes and discoveries.

Also note that this is designed for monitoring configuration overrides and not element level overrides such as enable / disable which wont work. It is specifically tailored towards performance thresholds which is the example I use below. However, any monitoring that takes an overrideable parameter could use this in theory.

So if you still want to do this, there are a few variations you can use and I will walk you through these and some of the pitfalls to watch for. 

The Basic MP

This is a very crude and simple example to show the concepts. I am going to use a single class for each sample and use registry discovery to discover an instance when a registry key exists. I am also going to populate a property on the instance with a value I get from the registry. This is going to be used as a threshold in monitoring. The actual monitoring is done using a simple two state performance threshold monitor. For my example I am going to use the process count performance counter since this is very easy to change just by starting and stopping processes.

In the MP you will see that I use an abstract class as my base class and then I have defined 4 concrete classes from that. This was just to save me time so I could just define the monitor once (targeted to the base class) and a single set of views. Each approach uses a different class so you can see all the examples in the MP. The classes used are:

  1. Application X – used to show the non protected simplest approach
  2. ApplicationX1 – a minor tweak to the first case to protect against a missing value
  3. Application Y – uses two discoveries to provide a default value when the value is not set
  4. Application Z – enhances the two discovery approach to provide an override to the default value

Disclaimer – as with all my sample MPs these are samples and should not be used in production. Discovery intervals are aggressive and also the MP is not complete with lots of display strings missing etc. Take this and incorporate into your MPs but use this in a test environment.

The Monitor

All the classes will have a monitor instantiated against them. As I mentioned this is a simple two state monitor. The configuration is shown below

<Configuration>
<ComputerName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</ComputerName>
<CounterName>Processes</CounterName>
<ObjectName>System</ObjectName>
<InstanceName/>
<AllInstances>false</AllInstances>
<Frequency>60</Frequency>
<Threshold>$Target/Property[Type="AuthorMPs.Demo.PropertyBasedMonitoring.ApplicationBase"]/ProcessThreshold$</Threshold>
</Configuration>

They key section is highlighted. This tells the monitor to use the discovered ProcessThreshold value on the instance that it is instantiated for rather than a value defined in the monitor configuration.

Once you have one of the discoveries running below and an instance created you can test this out by ensuring your process count is higher than the threshold you set. Once this happens the state will go red and you will get an alert. Once it drops below the value the state returns to healthy and the alert is closed.

The rest of this post is about how we discovered this property.

Non Protected

This is the simplest case and is shown with the ApplicationX class and associated discovery. I use a standard registry discovery that checks the existence of a key and also collects a value. If the key exists an instance is created and the value is written as a property and you will see an instance similar to this:

image

You can easily try this out by creating the registry key:

HKLM\Software\AuthorMPs\ApplicationX

You should also create a DWORD value under that key called ProcessThreshold. You should put an integer into this value that will be collected and stored on the discovered instance. Once the instance is discovered you can go and test the monitor I describe above.

So that’s it right – we are done. Well this might be OK but there is an important case to consider. If the registry value was and optional setting that the administrator had to go and create it is very likely this is not set everywhere. This causes a problem for our discovery and you can easily test this. Go and delete the registry value ProcessThreshold but keep the key. On the next discovery you will get a lovely event 10801 in the Operations Manager log indicating discovery data could not be inserted. There is a lot of detail in this event but the main issue is called out as this:

Invalid monitoring class property value specified in the discovery data item.The value needs to adhere to the type.
MonitoringClassPropertyName: AuthorMPs.Demo.PropertyBasedMonitoring.ApplicationBase.ProcessThreshold
MonitoringClassPropertyValue:

You will also see the discovery data item trying to be inserted and notice that the property value is set to empty which is not a valid integer. 

Whether this is a problem depends on whether you can rely on the registry value always being there. If you can’t you need to do a bit more work. One option is to simply make the class property a string instead of an integer which will take an empty value. However you are possibly pushing the problem downstream so if someone put a string in here your monitoring would fail. Also your monitoring is going to fail on empty anyhow since the performance monitor will actually compare to a threshold of zero and always be in a bad state. So quick fix is just avoiding the issue.

Whether you get an instance in your discovery will depend on the order of setting the registry value. If the ProcessThreshold was never set, you will never get an instance since the discovery will fail. If it was set and then removed later, the instance will remain but will never be undiscovered since all future discoveries are failing and will not generate a snapshot. If you removed the registry key itself the instance would be removed.

Protected with Consequences

There is another fairly quick fix to the problem but it comes with consequences again which may or may not be an issue for you.

You can update the discovery to check that both the registry key and registry value exist. You can do this easily with the following addition to the expression in the discovery:

<SimpleExpression>
<ValueExpression>
<XPathQuery Type="Integer">Values/ProcessThreshold</XPathQuery>
</ValueExpression>
<Operator>Greater</Operator>
<ValueExpression>
<Value Type="Integer">0</Value>
</ValueExpression>
</SimpleExpression>

The ApplicationX1 class shows this in action. It is exactly the same discovery as the ApplicationX class with this one change. You will not get the discovery error above because we never try to write the value or instantiate the class unless the value is present and set to greater than 0.

The downside to this approach is that you must have this threshold set before you class is discovered. If this value is being added just for a small subset of monitoring you want then you just messed up your other monitoring by not discovering unless the admin went and set this. So think about this and see if that is an issue for your scenario.

So we have two solutions so far and neither one are comprehensive. The next options look at setting a default value when there is nothing set in the registry.

Double Discovery

There is no easy way (at least that I came up with) to have a single discovery that has conditional logic where you could pull a value from the registry and if it was not there apply a default value instead. You could easily do it with a script but I like to avoid scripts when possible. Registry discoveries are lightweight and less error prone than scripts plus it is a challenge to sort this out. A small additional OR gate module would easily allow this and I have asked for this to be added by the SCOM team before but with no success (it was there for a short period but then withdrawn before shipping!).

So instead we are going to use two discoveries. These are registry discoveries and lightweight so adding one more is a minor overhead and these should not be running that frequency anyhow. In the sample MP ApplicationY shows this double discovery method.

The first discovery is the same as ApplicationX1 above. It will only discover if the key and value are present. The second discovery (with the suffix DefaultValueNoOverride in the MP). Is the interesting one. This is very similar to the first discovery with some subtle changes. First I collect the ProcessThreshold value as a boolean instead of an integer by changing the attribute type parameter:

<RegistryAttributeDefinition>
<AttributeName>ProcessThreshold</AttributeName>
<Path>SOFTWARE\AuthorMPs\ApplicationY\ProcessThreshold</Path>
<PathType>1</PathType>
<AttributeType>0</AttributeType>
</RegistryAttributeDefinition>

Next in the expression filter I match only when this is false i.e. the value does not exist. Finally in the mapping configuration I set the ProcessThreshold to a value I specify in the MP e.g. 80 as shown below.

<Setting>
<Name>$MPElement[Name="AuthorMPs.Demo.PropertyBasedMonitoring.ApplicationBase"]/ProcessThreshold$</Name>
<Value>80</Value>
</Setting>

So the logic is:

  • Registry key not present – both discoveries produces empty discovery data
  • Registry key present but value not present – first discovery creates the instance, second discovery produces empty discovery data
  • Registry key present and value present – first discovery produces empty discovery data, second discovery creates the instance

You can try these combinations out using the MP. Note to discover Application Y use this key:

HKLM\Software\AuthorMPs\ApplicationY

So that must be it? Well there is one problem with this solution. I had to hardcode this default value in the MP buried in the configuration for the second discovery. This makes it hard to change and it can’t be changed via an override. Also if this MP was sealed you are stuck with this default. So we can take this further into the night…

Double Discovery with Override

This is my last attempt I promise. I want to allow the second discovery I used above to take an override that specifies the default value if the registry value is not found. Unfortunately the only way to do this is to define my own composite module type to mimic the standard registry discovery module type that ships in the Windows library MP.

ApplicationZ in the MP shows this approach. The first discovery is the same as the last example except I use the following key:

HKLM\Software\AuthorMPs\ApplicationZ

The second discovery uses a custom module type called AppZDiscoveryModule that takes two parameters – frequency and default value.

The AppZDiscoveryModule uses the standard registry discovery but critically makes both of its configuration parameters overrideable. The main piece to note here is the mapping where I use the configuration parameter to set my default:

<Setting>
<Name>$MPElement[Name="AuthorMPs.Demo.PropertyBasedMonitoring.ApplicationBase"]/ProcessThreshold$</Name>
<Value>$Config/DefaultValue$</Value>
</Setting>

So I can now ship the MP with a default value but this can be overriden with a standard override. You can find the override by locating the discovery in the Object Discovery view of the Authoring Space of the console. When you override it you will see that DefaultValue is overrideable as well as frequency:

image

You can try this out by creating the ApplicationZ reg key above and not setting a ProcessThreshold value. The instance should be discovered with 77 by default. Now override the default to say 88 and wait for the configuration change to go down and the next discovery to run. You should see 88 written.

To check that we still honor the real value when set, go and add the registry value and ensure that this is written in preference to the default value.

Note you could definitely write this composite module type better and reuse it between the two discoveries but this was just an illustration!

Conclusion

There is a lot of concepts here! I just wanted to explore this solution a bit and show you that there are different levels of rigor you could approach this with. You may be OK to stop at the first solution if you can guarantee the value is always there. The choice is yours – have fun!

AuthorMPs.Demo.PropertyBasedMonitoring.xml