Hello, this is Prabhakar Shettigar and today I will be talking about an issue I worked recently involving Performance Monitor not properly showing the Logical Disk counters on a Cluster. The issue was that upon selecting the cluster name in the Computer Name dialog in Perfmon, we were not able to view the Logical Disk counters. We instead saw the instances listed as "Harddiskvolume#" instead of the drive letter. Also, if the cluster was failed over, Perfmon would no longer monitor the logical disks. For all intents and purposes it appeared as though Perfmon lost connectivity with the disks. Obviously if you’re trying to get a baseline of your server performance or troubleshoot an issue, this is worrying behavior.
What was happening was that when the Perfmon Logs and Alerts service monitors a remote system, it has to connect to the Remote Registry service to collect Perfmon data. If you use Perfmon to monitor the virtual node, the service will connect via RPC to the node that owns the virtual name and IP address. When we fail over the resources, the connection to the Remote Registry service does not fail over since that service is not cluster-aware. The RPC connection to the physical node remains – unless the node stops responding or the Remote Registry service is stopped.
While the connection is active, the disk counters will report the names of the underlying resources. If the resource is failed over, the drive is dismounted and may be presented as something different to the OS by the disk drivers. When this occurs, the disk counters just keep recording what is presented by the disk subsystem. If the underlying resources fail back, then they are remounted with the common name and the counters start getting valid data again.
For the above reasons, we recommend that Perfmon not be used to monitor the virtual node, but instead the physical nodes. Other monitoring tools may be designed to be Cluster aware and know how to handle this behavior, but Perfmon is not. To do this, the software must be able to communicate with the Cluster and determine the active node and compensate for any performance counter instance differences also. Microsoft Operations Manager (MOM) for instance has a Windows Server Cluster Management Pack available.
One other thing of note is that in a few cases monitoring the Logical Disk counters does not work properly even when using the physical node name. In this situation the cause may be due to a known timing issue with the Mount Manager. The resolution to this is to install the latest hotfix for MountMgr.sys, in this case from KB article 940307.
That’s all for today – a short post, but hopefully one that is useful for administrators who need to gather Performance data from Clustered servers.
– Prabhakar Shettigar