In the first part we provided a custom script-based dependency monitor, that gives you full flexibility of calculating health states for a distributed application. In this article, we will describe another way of creating a "custom" dependency monitor - by using a tricky native module and a standard recovery task in Operations Manager.
As mentioned in the first part, the main scenario to use custom rollup is when the classic rollup is not sufficient. For example, if you have a high availability service - the components of this service can be in different datacenters. Let's say, you have created a distributed application for this service in Operations Manager that displays the availability of that service depending on the status of its components in both datacenters. When using standard dependency monitors - your distributed application gets Critical even when one of the components in the Secondary Datacenter was broken. It makes more sense to show Warning state, as the service is still available but not ready for failover.
As stated, method described in this article uses Recovery tasks and there is also a special write action module that is used in this management pack.
The module is not our innovation. It is already used in «Computer not Reachable» and «Health Service Heartbeat Failure» monitors. If you look into these monitors' recoveries, you will find some interesting tasks and some of them use this write action module. The module logic is the following: when the «Health Service Heartbeat Failure» monitor gets Critical the diagnostic task starts and checks ICMP availability of the server. If the server does not respond to ICMP request then the recovery task fires and changes the state of «Computer not Reachable» monitor to Critical. In other words, this module has the ability to forcibly change health state for a monitor.
Unfortunately, there is no opportunity to configure this module directly in the Operations Manager console, that is why we will show how to configure it directly in XML.
These are steps you will have to follow to create Recovery task:
1. Install the library management pack in the attached file
2. Create new or use existing distributed application
3. Open Health Explorer of the distributed application or of it’s component
4. Select one of the dependency monitors that you would like to "customize"
5. Create Run Command Recovery task for this monitor
6. Export management pack where Recovery task has been saved
7. Add reference of the library management pack to the exported management pack if you have not done it before
8. Change write action module and save the management pack
9. Import management pack back to the Operations Manager
Let's go through the process in details.
We will skip the first step, as it is pretty straightforward.
In this article, we will use an existing distributed application, which consists of several components. Let's try to reconfigure the "DB" component and set it up to rollup Warning state for Warning or Critical child databases. That means that the DB component will never get into Critical state, only Warning and Healthy.
To create a Recovery task you need to do the following steps:
1. Open Health Explorer of the component «DB»
2. Select the monitor you want to change the state. You can create your own dependency monitor or use an existing one. In our example we will use an existing dependency monitor under «Availability»:
3. Open the properties of the dependency monitor, go to Diagnostic and Recovery tab, click Add in the Configure recovery tasks window and select Recovery for critical state in the context menu:
4. Select Run Command type and then select a management pack to save Recovery task. If the management pack where the distributed application is created is not sealed then you can save this Recovery task only in this management pack. Click Next.
5. On the General tab type the following name in the Recovery Name: «Critical to Warning Recovery Task». Select Critical health state and check Run recovery automatically and Recalculate monitor state after recovery finishes properties:
6. On the Command Line tab you may type some unique combination in the Full path to file field. We will change this block later, but that combination will really help you to find that XML block in the management pack. Select Create and then OK:
7. Export the management pack in which the Recovery task has been created. Open it for edit. Create backup of this management pack before editing!!!
8. If this is the first Recovery task of this type, you have to create Reference to the library management pack, which we have installed earlier in this management pack. So you have to add the following text between the tags <Reference></Reference>:
9. Find Recovery task. You can find it quicker using the unique combination which we typed in Full path to file field in Step 6. Alternatively, you can browse the management pack and find it between the tags <Recoveries></Recoveries>. As XML code it looks as the following:
<Recovery ID="MomUIGenaratedRecovery740fc548a8f748a6a69d2013a6cecd05" Accessibility="Public" Enabled="true" Target="SC_ea0945f3c9874a2b94b2a5749ef3ff66_Service_ddec17b501b64a94911602f8f1d981e1" Monitor="SCIMembership_a53870b8c04740c9abf01002955730b8_Availability_HealthMonitor" ResetMonitor="true" ExecuteOnState="Error" Remotable="true" Timeout="300">
<WriteAction ID="MomUIGenaratedModulece99d530c2b84e6f802d71b14087f301" TypeID="System!System.CommandExecuter">
<ApplicationName>Critical to Warning 555</ApplicationName>
10. You have to change the text, which is highlighted in yellow color to the following string:
<WriteAction ID="WA" TypeID="CustomTaskLibrary!Custom.Task.Library.Set.Monitor.Action">
The parameter Name for the tag <MonitorId> has to be changed to the same name as in the parameter Monitor (highlighted in green color).
11. Save the changes and import this management pack into the Operations Manager
12. Check how this task works. Note that, if the state of the component had already been Critical before you created this Recovery task - then you have to manually reset the state to Healthy, because Recovery tasks run only on state change (when monitor gets into Critical state).
This method also has some specifics:
· It cannot change the state of aggregate monitor according to the name of object which distributed application consists of. That is why we cannot use the weight of each object.
· This method requires additional testing, especially for big amount of objects with rapid state changes. For example, once we observed that our "customized" monitor was stuck in Warning state. It turned out, that the underlying monitor was switching back and forth so fast, that while recovery was changing the health state (even though it takes less than 500ms to do that), the child objects went back to Healthy state. In the end, it ended up in a strange picture with an aggregate monitor being warning while all underlying monitors were green.
In the example above, we have discussed only one particular case - when state of a monitor is forcibly changes from Critical to Warning. But you can play with that method and create different combinations. For example, you can configure Critical -> Success, Success -> Critical, Warning -> Critical, etc. Moreover, by using these recovery tasks you can change others' monitors state as well, not only state of the monitor that contains recovery task.
For your convenience, there is an example of a management pack in the attached file. It can be used as a library management pack.
Any feedback and comments are appreciated.
All content provided on this blog is for informational purposes only. Any references and links to other web sites are given for convenience of users. Microsoft cannot guarantee the accuracy or completeness of any information presented on these web sites. The references to external websites doesn't imply approval of information or solutions provided by such web sites.