High CPU on Wmiprvse.exe caused by memory leak DNSPROV.DLL Windows 2003
Certain customers have recently been experiencing an issue which I wanted to bring to your attention.
Issue with Domain Controllers Windows 2003 sp2
Wmiprvse.exe consistently consumes a high percentage of CPU on Domain Controllers and svchost.exe has a a high handle count of around 75000 and another svchost.exe hosting rpcss has 23000 handles.
Impact: Servers need to be restarted on a scheduled basis.
On Investigation of this issue I discovered that there have been other similar reported instances of this type of issue with other customers within the last 6 months.
Note: this does not occur in Windows Server 2008.
Cause
This has been traced to a problem with dnsprov.dll see below for more details;
“A windows Server 2003 (R2) SP2 machine, which implements a DNS role (usually true for many DCs), might become unreliable, unstable and misbehaving because of this problem. Manual intervention is needed to restore the server to its stable state each time administrators become aware of the problem going on, which can occur about once per week per DC, in an environment that implements SCOM/SCOM 2007 R2.
A windows Server 2003 server implementing the DNS role, when it receives certain WMI queries against the DNS WMI provider, will leak a TLS slot in the WMI process that hosts the DNS WMI provider. TLS slots are a finite resource (64+1024 slots available per process) so they can be quickly exhausted if leaked. A process that has its TLS slots exhausted doesn't behave normally and can incur in any kind of problem and unexpected behaviours.
Currently observed odd behaviours caused by this specific leak are:
- 100% CPU usages in the WMI host process that incurred the exhaustion.
- Other WMI providers sharing the same WMI host process not working as expected/misbehaving
Since WMI is a system service supporting many OS functions and application, having one of its processes in an unstable state makes the entire server unreliable, as mentioned and the problem needs to be resolved manually (DC reboot or WMI subsystem restarted).
SCOM 2007 happens to have a pattern of WMI queries that triggers the problem systematically after a few days monitoring a Windows Server 2003/DNS role.”
Workaround
On investigation of the issues 3 workarounds have proved successful in several of the previous reported cases.
Considering that:
1. The TLS slot is leaked each time a load/unload cycle occurs on the WMI DNS provider dnsprov.dll
2. A WMI provider is unloaded after 5 minutes it is idle
3. SCOM issues DNS queries at a rate that allows it to unload and reload between two queries
There are 3 possible workarounds see details below;
a. Execute a WMI script that uses the DNS provider to create an object and then never terminates, hence preventing the provider itself to become idle and then being unloaded. (Script is below).
' This script changes HostingModel property to run Microsoft DNS WMI provider
' in an isolated wmiprvse and allowing a workaround to a TLS leak.
strComputer = "."
strInstance = "__Win32Provider.Name='MS_NT_DNS_PROVIDER'"
strNewHostingModel="NetworkServiceHost:DNSSharedHost"
dim oMicrosoftDNSNamespace 'IWbemServices
dim oWMIProvider
Set oMicrosoftDNSNamespace = GetObject("winmgmts:"_
& "{impersonationLevel=impersonate, (Security)}!\\" _
& strComputer _
& "\root\MicrosoftDNS")
set oWMIProvider=oMicrosoftDNSNamespace.Get(strInstance)
Wscript.echo "Provider : " & oWMIProvider.Name
'updates the HostingModel property
Wscript.echo "Current value for HostingModel: " & oWMIProvider.HostingModel
If oWMIProvider.HostingModel=strNewHostingModel Then
Wscript.echo "No need to update DNS WMI Provider HostingModel property"
Else
oWMIProvider.HostingModel=strNewHostingModel
Wscript.echo "New value for HostingModel : " & oWMIProvider.HostingModel
'updates the object in the repository
oWMIProvider.Put_
End If
This needs to be renamed to .vbs. Also of course fully tested prior to being applied to the live production servers. The advantage of this is that this could be implemented via a Group Policy across the estate.
Note: This Script is provided with provided "AS IS" with no warranties, and confers no rights.
b. Isolating DNS prov. In a private wmiprvse. This can be done via the following steps;
1. Run WBEMTEST.
2. Click Connect and input root\microsoftdns in the Namespace.
3. Click Enum Classes..
4. Select Recursive and click OK.
5. From the classes list, select __Win32Provider and double click it.
6. Click Instances.
7. Select the instance and double click it.
8. Select HostingModel from the properties list and double click it.
9. Change the value from “NetworkServiceHost” to “NetworkServiceHost:DNSProvHost”
(without double quotation marks)
10. Click Save Property.
11. Click Save Object.
12. Click close to quit WBEMTEST
The obvious disadvantage of this is that the above steps for workaround b are manual and impractical across a large enterprise environment.
c. Write a simple rule in OpsMgr rule to keep the DNS provider from unloading by calling on it very frequently – this appears to keep the provider from unloading, and therefore leaking TLS slots.
Please see the following Blog which details this final workaround more specifically;
In most cases it will not be a problem if you are regularly patching and rebooting your servers on a regular basis. However if you are experiencing issues hopefully this information will help. If you are a Premier customer however I would advise raising a support case via Premier to double-check and validate the advice offered here. Plus also it gives you a documented escalation path.
Jane