OpsMgr 2007: Cluster shows as not monitored under cluster state

Every once in a while we run across an issue where the cluster state view in System Center Operations Manager 2007 may show a cluster as not monitored.  If you happen to run into this too then here are 4 steps sent to me by Adrian Doyle that should help get you started on the path to resolution:

========

1. Enable the Agency Proxy setting on all Domain Controllers.  Enabling the Agency Proxy allows each domain controller to discover it's connection object between other domain controllers.  To enable the Agency Proxy setting on all domain controllers follow these steps:

a. Open the Operations Console and click the Administration button.
b. In the Administration pane, click Agent Managed.
c. Double-click a domain controller in the list.
d. Click the Security tab.
e. Select Allow this agent to act as a proxy and discover managed objects on other computers.
f. Repeat steps 3 through 5 for each domain controller.

2. Verify that the DNSName field in the MT_Computer table is properly populated with the correct FQDN.

3. Run the Discover Cluster Task by going into the Operations Console and going under Monitoring->Microsoft Windows Cluster->Cluster Service State and right clicking on one of the physical nodes and running the Discover Cluster Task.

4. Review the event log for errors.  You may see errors similar to the following:

Event Type: Error
Event Source: Health Service Modules
Event Category: None
Event ID: 10801
Description: Discovery data couldn't be inserted to the database. This could have happened because of one of the following reasons:

                - Discovery data is stale. The discovery data is generated by an MP recently deleted.
- Database connectivity problems or database running out of space.
- Discovery data received is not valid.

The following details should help to further diagnose:

DiscoveryId: 5d7e65f8-968f-d501-320b-b26720a22b9a
HealthServiceId: 92c8ecb1-a5de-9a0e-a23e-856c7f4c2446
Invalid relationship target specified in the discovery data item.
RelationshipTargetBaseManagedEntityId: 2e00b47f-3496-1dd0-b4e2-c6c0f3882a73
RuleId: 5d7e65f8-968f-d501-320b-b26720a22b9a
Instance:
<?xml version="1.0" encoding="utf-16"?>
<RelationshipInstance TypeId="{524c7317-9a1c-a525-3d1c-acc7c2ff2200}" SourceTypeId="{e761abda-c410-9852-be05-d859560e6efa}" TargetTypeId="{f1ce5d3a-f8ab-1ae2-b5d7-94d999320673}">
<Settings />
<SourceRole>
<Settings>
<Setting>
<Name>{5C324096-D928-76DB-E9E7-E629DCC261B1}</Name>
<Value>FQDN of Physical Node</Value>
</Setting>
<Setting>
<Name>{C0D1A296-F98A-2580-3BE8-BFC1A4F22B33}</Name>
<Value>Virtual Servername</Value>
</Setting>
<Setting>
<Name>{5A481B0E-7A61-A639-CF28-B41CF019432F}</Name>
<Value>Physical Node Name</Value>
</Setting>
<Setting>
<Name>{CA7F145F-328F-CFC7-2FCD-B5A886AEB4C5}</Name>
<Value>Virtual Servername</Value>
</Setting>
</Settings>
</SourceRole>
<TargetRole>
<Settings>
<Setting>
<Name>{5C324096-D928-76DB-E9E7-E629DCC261B1}</Name>
<Value>FQDN of Physical node</Value>
</Setting>
<Setting>
<Name>{C0D1A296-F98A-2580-3BE8-BFC1A4F22B33}</Name>
<Value>Virtual Servername</Value></Setting>
<Setting>
<Name>{5A481B0E-7A61-A639-CF28-B41CF019432F}</Name>
<Value>Physical Node Name</Value>
</Setting>
<Setting>
<Name>{CA7F145F-328F-CFC7-2FCD-B5A886AEB4C5}</Name>
<Value>Virtual Servername</Value>
</Setting>
<Setting>
<Name>{2E0D225D-25BB-F33F-6031-FF09758505B5}</Name>
<Value> Virtual Servername - IP Address</Value>
</Setting>
</Settings>
</TargetRole>
</RelationshipInstance>.

In the above data the last <Value> reported is the erroring data. In this case it was addressed by removing the space that the Virtual Servername - IPaddress contained under the Name column under the cluster group in cluster administrator. This takes effect after the cluster is failed over. This issue can also be caused by a trailing space in one of the Name Column items as well.

========

We also have a couple hotfixes that fix some issues with cluster discovery so you'll want to take a look at those as well:

951979 - Problems occur on a management server that is running System Center Operations Manager 2007 Service Pack 1 when certain management packs are installed

951380 - Some computer properties for a cluster node may not be collected by the discovery process in System Center Operations Manager 2007 Service Pack 1

Thanks Adrian!

J.C. Hornbeck | Manageability Knowledge Engineer