WMI leaks memory on Server 2008 R2 monitored agents


 

Here is something that a customer brought to my attention, and is probably impacting you already.

 

They noticed that WMI on some of their Server 2008R2 monitored agents was consuming a large amount of memory – and continually increasing.  I started tracking this in SCOM by writing a rule to collect the Process\Private Bytes of all WMI processes (WmiPrvSE*) to check.

Sure enough – a handful (but not all strangely) of my Windows 2008 R2 monitored servers are exhibiting this behavior.  Below is a graph where see can see most processes are consuming ~20MB or less, but some are steadily increasing – consuming 400MB of RAM or more.

 

image

 

If it goes long enough – occasionally you might also see this in your event logs:

Log Name:      Application
Source:        Application Error
Date:          3/10/2010 4:24:35 PM
Event ID:      1000
Task Category: (100)
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      VS5.opsmgr.net
Description:
Faulting application name: wmiprvse.exe, version: 6.1.7600.16385, time stamp: 0x4a5bc794
Faulting module name: ole32.dll, version: 6.1.7600.16385, time stamp: 0x4a5be01a
Exception code: 0xc0000005
Fault offset: 0x0000000000039389
Faulting process id: 0x180
Faulting application start time: 0x01cabfafa91cc252
Faulting application path: C:\Windows\system32\wbem\wmiprvse.exe
Faulting module path: C:\Windows\system32\ole32.dll
Report Id: b45b5a1d-2c93-11df-ac21-001b213a78be

 

It turns out there is a hotfix for Windows 2008 R2 – which addresses a possible leak when an application queries the Win32_Service class frequently.  A monitoring tool would do this – and therefore OpsMgr can accelerate this leak in the OS.

http://support.microsoft.com/kb/981314

 

This hotfix addresses this issue – I applied it to my servers – and they are no longer leaking memory from the WMI process.

 

Capture

 

I am adding this hotfix to my recommended hotfixes link, in the OS section.

 

http://blogs.technet.com/b/kevinholman/archive/2009/01/27/which-hotfixes-should-i-apply.aspx

 

 

These are some signs that this might be impacting you in OpsMgr:

 

You might get some alerts in the console like the following:

Workflow Runtime: Failed to run a process or script

The process started at 1:22:12 AM failed to create System.Discovery.Data. Errors found in output:

C:\Program Files\System Center Operations Manager 2007\Health Service State\Monitoring Host Temporary Files 5\4235\AlertUpdateConnectorDiscovery.vbs(16, 1) SWbemObjectSet: No more threads can be created in the system.

Command executed: "C:\Windows\system32\cscript.exe" /nologo "AlertUpdateConnectorDiscovery.vbs" {A7504CAE-3EA5-5B1F-CDA4-A4593E4D85FD} {F8AEF188-D663-9719-3FD8-94B2AF6F0726} SQL2V1.opsmgr.net
Working Directory: C:\Program Files\System Center Operations Manager 2007\Health Service State\Monitoring Host Temporary Files 5\4235\

One or more workflows were affected by this.

Workflow name: AlertUpdateConnector.ConnectorDiscovery
Instance name: SQL2V1.opsmgr.net
Instance ID: {F8AEF188-D663-9719-3FD8-94B2AF6F0726}
Management group: PROD1

Or:

Workflow Runtime: Failed to run a WMI query

Object enumeration failed

Query: ‘SELECT DisplayName, Name, StartMode FROM Win32_Service WHERE Name="ClusSvc" and StartMode!="Disabled"’
HRESULT: 0x80041006
Details: Out of memory

One or more workflows were affected by this.

Workflow name: Microsoft.Windows.Cluster.Service.Discovery
Instance name: SQL2CLN1.opsmgr.net
Instance ID: {90476733-8FA9-1718-152C-932FF9AB9BC6}
Management group: PROD1

Or:

Workflow Runtime: Failed to run a WMI query

Object enumeration failed

Query: ‘SELECT NumberOfProcessors FROM Win32_ComputerSystem WHERE DomainRole >1’
HRESULT: 0x800705af
Details: The paging file is too small for this operation to complete.

One or more workflows were affected by this.

Workflow name: System.Mom.BackwardCompatibility.Computer.Server.DiscoveryRule
Instance name: SQLDB1.opsmgr.net
Instance ID: {AF7C2749-FF52-E354-EEAE-8CFCA3541607}
Management group: PROD1

The details of the script or discovery or workflow are irrelevant.  What is relevant here is seeing the messages “No more threads can be created in the system” and Out of memory” and “The paging file is too small for this operation to complete”.

Those are tell-tale signs of a memory leak or memory pressure, and in this case caused by WMI.

 

Sure enough – when I check this system, I can easily see there is an issue:

 

image

 

If you are running Server 2008R2 on ANY monitored system, it is highly likely that you need to apply this hotfix. 

I recommend it across the board for all Windows 2008 R2 monitored agents, until Windows Server 2008R2 SP1 releases, or something supersedes this.

Comments (40)

  1. Anonymous says:

    I will copy my response from your post on Systemcentercentral.com:

    I am tracking a similar issue with another customer. There are lots of things that can cause leaks in WMI. The only hotfix I can find is for when perfmon is used remotely, which I dont think is related.

    We do see WMIPRVSE processes using over 100 MB (as high as 500MB) when this condition exists. Bouncing the server does resolve the issue, until the WMI processes get large again.

    There are many things that might be causing this , like HP/Dell hardware agents, other software agents that access and use WMI, and it could possibly be opsMgr related, but it seems to only happen on systems with LARGE amounts of RAM. I am still investigating and will post something when I find it – but it would be good to open a case with Microsoft from the Windows side, and have them investigate root cause of WMI apparently leaking memory.

  2. Anonymous says:

    Hans – you are spot on.  There is another leak, however, it is not nearly as aggressive as this one.  I have been tracking it for a long time, however, we aren't getting any cases on it, and most customers don't notice it because it isn't very aggressive… in a month it doesn't generally leak more than 300-400MB of memory, depending on the server role.  I haven't completed all my testing on this so I haven't published anything about it, but I have observed the same.  It will take a customer reporting this and working with PSS to get some resolution there.

  3. Anonymous says:

    @ Patrik –

    I have not seen that behavior and I have LOTS of customers running this hotfix.  Very interesting.  Have you opened a case on this?  

    This hotfix is also included in Windows 2008 R2 SP1 – so SP1 will be the recommendation moving forward.

  4. Anonymous says:

    Also – NEVER target a GROUP – that is OpsMgr 101.  You cannot target groups with any workflow.

    You need to target a non-singleton Class – such as "Windows Server Operating System" or "Windows Server 2008 Operating System"

  5. Anonymous says:

    All versions are vailable – you need to hit the "Show hotfixes for all platforms and languages" because by default this hotfix page will only display your detected OS.

    There is an x86 version for Win7.  Server 2008R2 is 64 bit only.

  6. Anonymous says:

    This fix CANNOT be applied to a non-R2 server.  It wont work, wont install…  This is for a VERY specific issue that presents itself only on Win7/2008R2.

    There are other 2008 WMI hotfixes – but you need to do much more research as to why your system becomes unstable.

  7. Anonymous says:

    Peter – as far as I know – yes this is only an issue for Windows Server 2008 R2.

  8. Anonymous says:

    Dom – this is not a WMI event.  This is a simple perf counter.

    Process Private Bytes WmiPrvSE*

    Just use WmiPrvSE* for the instance to get them all…. in a perf collection rule.

  9. Kevin Holman says:

    @slade –

    I have not experienced this post SP1, but there are a few hotfixes that update components around WMI:
    http://support.microsoft.com/kb/2547244
    http://support.microsoft.com/kb/2618982

    Is there something specific about these servers from a role perspective? Such as – are they DNS servers, or DC’s, or file servers, or ?. Crash a server??? With a WMI leak? That’s odd because WMI will crash its own process before it has any impact on a server. Does this happen even if SCOM is disabled/uninstalled?

  10. Efim says:

    Great Job catching it! I dont have too many customers on R2 yet, but this one for the toolbox 🙂

  11. Dominique says:

    Hi Kevin,

    How did you create the rule? I need to show the same graphs to confirm a=or not I have the same issue.

    I started

    Authoring Tab > Management Pack Objects > Rules > Create a new Rule > Collection Rules > Event Based > WMI Event (associated with a custom MP) > Rule Target "Windows Server 2008 R2 Computer Group"  > WMI Namespace >

    or

    Authoring Tab > Management Pack Objects > Rules > Create a new Rule > Collection Rules > Performance Based > WMI Event (associated with a custom MP) > Rule Target "Windows Server 2008 R2 Computer Group"  > WMI Namespace >

    or

    Authoring > Management Pack Templates > Windows Service

    Which way is the best ?

    Thanks,

    Dom

  12. Dominique says:

    Hello,

    or i could do also

    Authoring Tab > Management Pack Objects > Rules > Create a new Rule > Collection Rules > NT Event: Log

    Event ID: 1000

    Event Source: Application Error

    Event Level: Error

    But this ione seems too genereic isn't it?

    Thanks,

    Dom

  13. Dominique says:

    Hi Kevin,

    Thanks let me change this and see how it goes…

    Dom

  14. Dominique says:

    Hi Kevin,

    The rule (called: xxxx – WMI Leakage  on Windows Server 2008 R2) is in place "Disabled" per default overridden for the class "Windows Server 2008 R2 Operating System" to "Enabled = True". I have a group of  35 Servers with Windows Server 2008 R2 Operating System in the Windows 2008  R2 Computer Group. The whole Group Windows Server 2008 Computer Group has 106 servers.

    Rule Target: Windows Server 2008 Operating System

    I created a Performance View on data related to "Windows Server Operating System" for the Group Windows Server 2008 R2 Computer Group" with View Performance collected by … when when browing for the rule i do not see it available from the list…. Under Authoring > Management Pack Objects > Rules it is available… should I change one parameter in the view… the "data related to" and/or the group?

    Don't forget to note the Rule and the View should be in the same Category, I used Performance Collection for both where I tried "Custom category earlier!!! (:

    Now it works…

    I will wait now several days of collection to get more data.

    Thanks,

    Dom

  15. Stephen says:

    We have 28 servers upgraded to R2. After running the counter against all R2 servers for a day we can already see one candidate for the hotfix. The server's WMI service went from 15MB to 147MB of memory consuption. Thanks for the tip!

  16. Peter says:

    Is this only a issue for 2008 R2 – not 2008 ?

  17. Ernie says:

    Hello Kevin

    I downloaded this hotfix but there only seems to be an x86 version of the host fix. We are using the x64 version of Windows 2008 R2

    Do you know if there is an x64 version of this hot fix or is it not required for x64 platforms

    Thanks in advance

  18. Wilson W. says:

    I see this exact same issue on my Exchange 2007 servers….the only thing is that they are not running R2, -only Windows2008 with SP2.

    From what I can tell, many of the WMI hotfixes are already covered by SP2, so I am going nuts trying to figure out what is causing the WMI errors in the event logs.

    I too am getting "paging file is too small" errors.  I'm also seeing "Not enough storage is available to complete this operation ".  I checked the WMI process though and memory usage was normal, so I don't understand why I am seeing these kinds of WMI errors if the WMI process is not consuming a large amt of memory…

  19. Wilson W. says:

    Thx Kevin.  I will be contacting Microsoft support on this and will update the msg thread on SystemCenterCentral and on here to let everybody know what the findings are.

  20. Manoj S. says:

    Hello Kevin & Wilson,

    We are seeing a similar issue with our ConfigMgr Central & Primary Site servers that run on Server 2008 with SP1 and NO R2.  We have had CM on these boxes for a while, however, we started to get these WMI errors shortly after installing OpsMgr agents on these boxes (as far as I can tell.  I could be wrong but no other significant change has been made to these boxes).  We get errors similar to this listed below and when we try to connect to the CM Site database through the admin console, it fails.  Basically, the only way we get CM to work again is by rebooting.  One of the server where we are seeing this issue, I have put that box in Maintenance mode in OpsMgr, trying to see if that would help…  Any suggestions would greatly be appreciated!  Manoj

    Log Name:      Application

    Source:        Perfstat_for_Windows

    Date:          10/1/2010 2:40:14 PM

    Event ID:      7015

    Task Category: None

    Level:         Error

    Keywords:      Classic

    User:          N/A

    Computer:      SITE_SERVER_NAME

    Description:

    Exception – error during XML_WMICAT. :

    Invalid query

    memfree,Win32_PerfRawData_PerfOS_Memory,free,AvailableKBytes

  21. Monkey Kong says:

    Hi Kevin. Thanks for the good read regarding these WMI issues. I'm with some of those who have commented – I see these issues on a non-R2 server. I also read somewhere that some have applied this fix to non-R2 installations and it has stopped issues related to leaks caused by WMI. We are running an HP ML350 G6 with SBS08. A bunch of other software have started intermittently failing due to lack of resources, and only a reboot can get things back to normal. This usually only lasts a week before needing another reboot.

    If anyone can help point me in the right direction I'd be most grateful. Thanks for the time!

  22. Patrik says:

    Hi, we have this problem on some of our 2008 R2 Domain Controllers.

    The thing is that if we apply the hotfix the DC will hang in about 2 days with all sort of alarm and errors. it will all end up with that you cant logon to the server and the only way out is to reboot it. And when we roll back the the hotfix everything gos back to normal, that will say stable DC but with WMI processes that leaks memory.

    So this is just a heads up for all you.

  23. Pete says:

    Just wanted to add to the people that are seeing this with non-R2 servers. I am seeing it on some server 2008 with SP2 machines.

  24. Steve says:

    Frustratingly, whilst this hotfix solves the problem of WMI leaking uncontrollably, it seems to do so by simply setting a threshold amount of memory (somewhere around 600Mb) and simply terminating and restarting WMIPrvSE when this is exceeded.   We're using WMI to monitor MSMQ, and get alerts and see a drop out when the monitoring goes to grab a WMI counter and gets "Query was syntactically invalid" as a response.  Moments later, the Working set for WMIPrvSE will drop from 550Mb to 50Mb, and everything will return to normal.  I'm not knocking the hotfix (who wants a server that needs rebooting all the time) but a better fix than this was surely possible?

  25. Brooke Philpott says:

    I blogged about this a while back when we found it. The details are here:

    brooke.blogs.sqlsentry.net/…/win32service-memory-leak.html

    It wasn't initially slated to be included until the first service pack, but after pushing hard for them to release a hotfix they did.

  26. Hans says:

    We have this on a couple of 2008 R2 Server WITH SP1. And the hotfix is not applicable. Thought they fixed it in SP1 ??

  27. fdxpilot says:

    Thanks for posting this! I just ran into a situation where WMI was crashing with this error while a third-party application was initiating a SQL backup (causing the backup to fail of course!) and this pointed me in the right direction to resolve the issue.

  28. Lats says:

    Kevin, was there any update on this? You are right with the fact that it is still there and not aggressive as the first, but it is causing some issues for us on few of our systems.

  29. Hanz says:

    Is there alreay a fix, cause?

    Win2008R2 Sp1 in use

    It's annoying, scom receives a lot of warnings which are related to this problem.

  30. Rune Bakken says:

    Hi,

    We are still seeing this issue even with SP1 installed.

    Please advise.

    Rune

  31. zubi says:

    There is not enough memory to run this program! and I even could not run any program . Please give me a fix for this issue. OS: Windows Server 2008 R2 Standard Ed.

  32. April says:

    We are still seeing this issue post SP1, and the leak is quite aggressive. It is nice that Windows 'knows' to terminate the runaway process, but having this happen every 2-3 minutes seems a bit much.

  33. zubi says:

    is there any solution after debugging I found this message! "Auto crash rule for Leak rule for WmiPrvSE.exe(77944)"

  34. slade says:

    We’re seeing this with a large amount of SP1 servers. You’re article was spot on, but i can find no patches applicable for SP1. Have you made any headway with this. It will crash a server in about 30 minutes for us, bit it is random and hard to replicate, but once it starts, it doesnt take long…

  35. Anonymous says:

    WMI leaks memory on Server 2008 R2 monitored agents – Kevin Holman’s System Center Blog – Site Home – TechNet Blogs

  36. Maneesh says:

    Is this related to recently disclosed x86 flaw too.
    http://www.bit-tech.net/news/hardware/2015/08/07/x86-security-flaw/1
    Thank you

  37. andyinsdca says:

    This is still happening on a Server 08R2 running Connected Backups. This server has SP1. (agent is SCOM 2012R2 UR4)

  38. Andrew R says:

    Same alerting from SCOM on W2008 SP2. None of the hotfixes are applicable apparently. Server has SQL 2008 on but only to SP2. Hoping SP4 will resolve.

  39. Anonymous says:

    SCOM Agent accelerates WMI memory leak.