Your MP Discoveries and Clustering

This is one of a few posts I will likely do on monitoring applications running on clusters using SCOM. This is one of the areas that does not get much attention from MP developers especially if their applications are not cluster aware. However, there are things you need to consider for every production quality management pack even if your application knows nothing of clusters. I am not going to discuss the Windows 2008 cluster management pack that is available from the management pack catalog. This MP is for monitoring the cluster role itself and not applications on top of the cluster. SCOM has some core pieces for clustering that are there out of the box and this is all I will cover in this post.

Let’s start with the basics of a failover cluster. This post is not an introduction to clustering but I wanted to ensure you understand a little about network names and how these relate to computers in SCOM. Testing against clusters can be a pain since clustering is not something you can readily set up in your environment. However there are options here and I will follow up with a post on how to set up a cluster within Hyper-V if you have access to an iSCSI software target.

I am using a two node cluster. To form this I took two domain joined Windows Servers with shared disk between them and followed the simple wizard to create the cluster. When you create a cluster, the cluster name object (CNO) is created and you specify an IP address to use. The screen shot below shows my basic setup:

image

So I have the following setup:

  • stwilson7cl01, stwilson7cl02 are physical cluster nodes
  • stwilson7cl is the cluster name object (this is a network name resource)
  • Volume Q: is the shared disk I am using for the quorum witness disk

Now I am going to add a clustered File Server to the cluster using a second disk I have shared. This is very easy to do using the Configure a Service or Application wizard from the failover cluster manager MMC. This File Server adds the following resources to the cluster

  • stwilson7clfs network name
  • IP address
  • Volume S:

This is shown below:

image

Now lets look at how this cluster appears in SCOM. To monitor this cluster I did the following:

  • Deployed agents to stwilson7cl01 and stwilson7cl02
  • Set agent proxying to enabled on both nodes
  • Waited for discovery to occur

If you go to the Windows Computer state view in the Operations Console you will see each network name resource in the cluster shown as a computer. In my case I have two network name objects (one for the cluster name, one for the file server name). Therefore I see the two cluster nodes (stwilson7cl01 and stwilson7cl02) and the two additional names (stwilsoncl and stwilsonclfs):

image

Notice that the virtual network names do not show an agent instance on them which is correct because they are actually monitored by the agents on the cluster nodes stwilson7cl01 and stwilson7cl02.

SCOM will manage with agent monitors the applications on each virtual name. Basically the node that has the resource group active with the network name in it will do the monitoring. SCOM handles the failover of this monitoring as the resource group moves between nodes. The detail of this is not the purpose of this post however.

Cluster Ignorant Discovery

OK so why as an author do I care. My application is not clustered so I don’t need to do anything – right? Well actually there is a problem with this. Lets look at the basic discovery for an application that is not cluster aware. I am going to use a simple registry based discovery throughout these examples.

For the first example I am going to use a class called NonClusteredAppX in my MP (the MP is attached to this post). I have targeted a discovery at the Windows Server class so this would run against all my Windows Servers which is a fairly normal practice. Lets look at the discovery results using a simple state view I added to the MP:

image

Can you spot the problem? Well I only have two nodes in the cluster but the application has been discovered on the virtual computer objects in the cluster since SCOM treats them as Windows Servers. This is a problem which I will now show. I have added a simple event monitor to my NonClusteredAppX class that looks for an event in the event log. To see the full effect of the issue I made sure both resource groups (the cluster and the file server) are running on node one of the cluster. Then I ran the following from the command line on node 1 to generate the event:

EventCreate –id 101 –t information –d “Test”

Now the state view shows this:

image

The problem is the monitor is being run by all instances of the application and you have one on each computer object.

If I had generated an alert from this monitor I would have 3 alerts for the same problem. So as you can see this is something you might want to consider fixing. If you application is indeed not cluster aware the fix is very simple and I cover that next.

If you are following along you can reset the state with this event:

EventCreate –id 102 –t information –d “Test”

Cluster Aware Discovery

There is a basic feature of the SCOM computer discovery we can take advantage of. There is a boolean property defined on Windows Server called IsVirtualNode. This will either be empty or true (never false).

For a non clustered application that can’t be clustered the basic fix to the discovery is to use this in the filter logic of the discovery. In the MP I have added a new class called NonClusteredAppY. For the discovery of this class I have added the logic below:

<Expression>
<And>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery Type="Boolean">Values/ApplicationExists</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value Type="Boolean">true</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
<Expression>
<SimpleExpression>
<ValueExpression>
<Value>$Target/Property[Type='Windows!Microsoft.Windows.Server.Computer']/IsVirtualNode$</Value>
</ValueExpression>
<Operator>NotEqual</Operator>
<ValueExpression>
<Value Type="Boolean">true</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
</And>
</Expression>

Now when this discovery runs, I only get instances created on the physical cluster nodes not the virtual computers. Therefore instead of four instances I correctly get two:

image
So this may be all you have to do! However, if you application is in fact designed to run on a cluster you need to swap this around a bit as the next sections show.

Cluster Only Application

If you want to only discover on a virtual computer in a cluster then you basically need to switch the logic around your expression as below:

<Expression>
<SimpleExpression>
<ValueExpression>
<Value>$Target/Property[Type='Windows!Microsoft.Windows.Server.Computer']/IsVirtualNode$</Value>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value Type="Boolean">true</Value>
</ValueExpression>
</SimpleExpression>
</Expression>

My ClusteredAppA class in the attached MP shows this logic. Now this will only be discovered on virtual nodes as shown below:

image

Note that it is likely you need to do more logic in your discovery since your application is probably not running in all resource groups. I cover this at the end of this post.

Mapping IsVirtualNode

There is a slight extension to the cluster only application discovery above. In my example MP, my classes are derived from Windows Computer Role. This base class has a hosting relationship to Windows Computer not Windows Server. This means that you can never get back to the fact that your instance is running on a cluster virtual computer object because if you tried to use a $Target/Host/Property expression it would resolve to the Windows Computer class that does not have this property.

Therefore if any subsequent discovery or monitoring that will be targeted at your class needs to have access to this property you should add a property to your application class and map this during discovery. MY ClusteredAppB class shows this. In the discovery I add a mapping:

<Setting>
<Name>$MPElement[Name="AuthorMPs.Demo.ClusterDiscovery.ClusteredAppB"]/IsVirtualNode$</Name>
<Value>true</Value>
</Setting>

Note that I se this to true since I know this is always the case since I have already checked in my expression for this. If my application could run on clustered or non clustered I would do this instead:

<Setting>
<Name>$MPElement[Name="AuthorMPs.Demo.ClusterDiscovery.ClusteredAppB"]/IsVirtualNode$</Name>
<Value>$Target/Property[Type='Windows!Microsoft.Windows.Server.Computer']/IsVirtualNode$</Value>
</Setting>

Clustered and Non Clustered

The above showed some basics of how you can control cluster discovery and avoid the common pitfall. What if you application can run on a cluster or non-clustered server. In this case you are likely to need to do more than a simple registry discovery. An example of this is SQL server. During the discovery this checks if the current computer it is running against is clustered and also makes sure only instances that match this computer name are discovered on it. The logic is fairly complex and is script based. Most cluster aware application discoveries will be script based since each application has its own way of representing itself as clustered.

If you are using a script based discovery you should definitely consider the seed pattern I

AuthorMPs.Demo.ClusterDiscovery.xml