OpsMgr 2007: Monitoringhost.exe or HealthService.exe may have a high (>10000) handle count and restart

image

UPDATED 08/17/2009: This issue has been resolved with Knowledge Base article 968760: https://support.microsoft.com/kb/968760.

======== 

Here’s kind of an interesting issue I thought we should probably give you a heads up on.  It’s not really a ‘problem’ per se but it is something that can cause some concern if you don’t know what’s going on.  Anyway, here you go:

========

Issue: On Windows computers acting as a Management Servers or agents for System Center Operations Manager 2007, Monitoringhost.exe or HealthService.exe may have a high (>10000) handle count and restart. This issue manifests primarily on 64-bit Windows systems, but we have also had a few reports of problems on x86 systems.

You’ll usually notice this because the ‘Health Service Handle Count Threshold Exceeded’ monitor goes critical and the healthservice is restarted, or the ‘MonitoringHost Handle Count Threshold’ rule fires, generating an alert and restarting Monitoringhoste.exe. Once the healthservice is restarted, the count returns to normal but will begin increasing as the system works.

Using the handles.exe tool from Sysinternals to dump the affected process shows a very high ‘Events’ handle count and may also show a large number of thread handles. Following is a sample output from an affected system.

Handle v3.42
Copyright (C) 1997-2008 Mark Russinovich
Sysinternals - www.sysinternals.com
Handle type summary:
ALPC Port : 11
Desktop : 1
Directory : 2
EtwRegistration : 36
Event : 16010
File : 452
IoCompletion : 23
Key : 48
KeyedEvent : 2
Mutant : 9
Section : 15
Semaphore : 82
Thread : 3956
Timer : 5
Token : 183
TpWorkerFactory : 2
WindowStation : 2
Total handles: 20839

Cause:   This occurs because the Garbage Collector won’t run until enough objects have been allocated. In this case very few (sometimes zero) objects are being allocated. The runtime tracks the threads with native data structures but relies on finalization to clean them up. Since the garbage collection never runs, finalization never runs, and as a result the native data handles are never cleaned up.

Workaround: The high handle count does not appear to cause any performance issues – it is problematic because the SCOM monitor will go critical, and also because it can theoretically continue to grow with no practical upper limit, effectively rendering the monitor useless for health state monitoring.  The upper limits for the ‘Health Service Handle Count Threshold Exceeded’ monitor and the ‘MonitoringHost Handle Count Threshold (Management Server)’ and ‘Monitoring Host Handle Count Threshold’ rules can be increased. The upper limit of this can be overridden to 50,000-100,000. This does mean the healthservice and/or monitoringhost.exe will eventually be restarted but increasing the limit allows more time between restarts.

Hope this helps!

J.C. Hornbeck | Manageability Knowledge Engineer