Error alerts from the DNS MP – script failures, WMI Probe failed?


Updated 11/16/09 – I think this is pretty much resolve now – read below:

 

I have seen this at several customer sites, and even in my own lab.  You might find the following alerts (below) stemming from the DNS MP.

To start, I would recommend the resolutions in my previous post:  Getting lots of Script Failed To Run alerts- WMI Probe Failed Execution- Backward Compatibility

Everything at that post above helps, however, it does not resolve all of the alerts, 100% of the time.  After about two weeks on a Windows 2003 DC/DNS server… the problem can re-occur with WMI failures and script errors.  Restarting the computer, or restarting WMI will immediately resolve it. 

This appears to be an issue with the Windows DNS WMI provider, that causes this Generic Failure when trying to access the WMI based DNS namespace, and query it.  It appears that there is a TLS slot leak every time the DNS WMI provider unloads.  It appears that the DNS WMI provider will unload after 5 minutes of not being accessed.  Those who patch their computers monthly, likely wont even see this issue, or only see it for a short time until the next patch cycle. 

To resolve it – I have written a monitor (example and sample MP below) which queries the DNS WMI namespace every 4 minutes, which keeps the provider from unloading.  Therefore, the DNS provider stays loaded, and never has to unload, and leak a TLS slot.  This has actually shown to resolve some other issues with scripts and latency, caused by the DNS WMI provider having to load back up after an unload.

 

 

The events/alerts you may see to define the error condition:

 

WMI Probe Module Failed Execution
Log Name:  Operations Manager
Source:  Health Service Modules
Event Number:  10409
Description:
Object enumeration failed
Query: ‘Select EventLogLevel from MicrosoftDNS_Server’
HRESULT: 0x80041001
Details: Generic failure
One or more workflows were affected by this.
Workflow name: Microsoft.Windows.DNSServer.2003.Monitor.ServerLoggingLevel
Instance name: dc01.opsmgr.net
Instance ID: {11056C4C-B933-98ED-3DC5-4B9AAE232B23}
Management group: PROD1

 

WMI Probe Module Failed Execution
Log Name:  Operations Manager
Source:  Health Service Modules
Event Number:  10409
Description:
Object enumeration failed
Query: ‘Select Name, Shutdown, Paused from MicrosoftDNS_Zone’
HRESULT: 0x80041001
Details: Generic failure
One or more workflows were affected by this.
Workflow name: Microsoft.Windows.DNSServer.2003.Monitor.ZoneRunning
Instance name: test.opsmgr.net (dc01.opsmgr.net)
Instance ID: {E0A3BD98-04B7-0C44-B26D-F8E6175456D1}
Management group: PROD1

 

Script or Executable Failed to run
Log Name: Operations Manager
Source: Health Service Modules
Event Number: 21406
Description:
The process started at 6:26:59 AM failed to create System.Discovery.Data. Errors found in output:
C:\Program Files\System Center Operations Manager 2007\Health Service State\Monitoring Host Temporary Files 10\8675\DNS2003ComponentDiscovery.vbs(123, 9) SWbemServicesEx: Generic failure
Command executed: "C:\WINDOWS\system32\cscript.exe" /nologo "DNS2003ComponentDiscovery.vbs" {C984657D-0255-F11B-2C76-1542793A684D} {11056C4C-B933-98ED-3DC5-4B9AAE232B23} dc01.opsmgr.net true true true "" false 700 1 Working Directory: C:\Program Files\System Center Operations Manager 2007\Health Service State\Monitoring Host Temporary Files 10\8675\
One or more workflows were affected by this.
Workflow name: Microsoft.Windows.DNSServer.2003.Discovery.Components
Instance name: dc01.opsmgr.net
Instance ID: {11056C4C-B933-98ED-3DC5-4B9AAE232B23}
Management group: PROD1 

 

Script or Executable Failed to run
Log Name: Operations Manager
Source: Health Service Modules
Event Number: 21405
Description:
The process started at 3:58:21 AM failed to create System.Discovery.Data, no errors detected in the output. The process exited with 0
Command executed: "C:\WINDOWS\system32\cscript.exe" /nologo "DNS2003Discovery.vbs" {C8655A28-E27E-C6ED-B158-8569219A71A6} {89AC2E61-9144-4B94-9028-5A25F547213E} dc01.opsmgr.net false
Working Directory: C:\Program Files\System Center Operations Manager 2007\Health Service State\Monitoring Host Temporary Files 10\8515\
One or more workflows were affected by this.
Workflow name: Microsoft.Windows.DNSServer.2003.ServerDiscovery
Instance name: dc01.opsmgr.net
Instance ID: {89AC2E61-9144-4B94-9028-5A25F547213E}
Management group: PROD1

 

Script or Executable Failed to run

Event Type:              Error
Event Source:           Health Service Script
Event Category:        None
Event ID:                  1152
Date:                        5/19/2009
Time:                        11:18:48 AM
User:                        N/A
Computer:                DC01
Description:
DNS2003Discovery.vbs : The Query ‘select * from MicrosoftDNS_Server’ did not return any valid instances. 
Please check to see if this is a valid WMI Query.. Generic failure

 

 

So…. at this point, you have updated Cscript to 5.7 KB955360, and applied the KB933061 hotfix to stabilize WMI.  However, after a period of time – these errors start happening again?

Since the issue is a problem caused by the Windows DNS WMI provider unloading – we need to keep it loaded.  Since I believe it unloads after 5 minutes of inactivity, we need to make sure we query WMI at least every 4 minutes.  The simplest, cheapest, and easiest way I know to do that… is to create a simple performance monitor, that queries the DNS WMI namespace for a value, every 4 minutes.  I have a complete write-up on how to create this monitor at THIS LINK.

 

I will start by creating a new Management pack – “Custom – DNS Addendum MP”

Next – I will create a new monitor, Unit Monitor, WMI Performance Counters, Static Thresholds, Single Threshold, Simple Threshold.

image

Give the monitor a name.  I used “Custom – DNS Monitor Query to keep namespace loaded”

For the monitor target – since this is a problem only on Windows Server 2003, I chose “DNS 2003 Server”.  We do not need to do this on Server 2008.

For the Parent monitor, I chose performance:

image

Next, we need to fill in the namespace, query, and frequency.  I input “root\MicrosoftDNS” for the namespace, and “Select EventLogLevel from MicrosoftDNS_Server”.  Since I want it to run every 4 minutes, that would be 240 seconds:

image

For the performance mapper section – this is the most confusing – I explain it a bit deeper at THIS LINK  For now – just follow the graphic below:

image

Next, on the Threshold page… since this monitor is not really supposed to do anything other than query WMI on a schedule… we don’t want it to alert.  The query we are running for this example will return an integer from 0-10, so I will set this to 99, a number it could never return so the monitor will never change state.

Next, on the Alert Settings, do NOT generate alerts for this monitor.

Click Create.  That is it. 

For those who want to test this – I am attaching my sample management pack with only this monitor in it.  To use my MP, you will need to have SCOM R2, otherwise you can create your own monitor as above.

Custom.DNS.Addendum.MP.zip

Comments (24)

  1. Anonymous says:

    If you are getting a generic failure – the problem is the leak in WMI.  If you bounce the server – does the problem go away for a little while?

    Those two queries are identical, changing the order doesnt matter, both are valid.

  2. Anonymous says:

    Kevin!

    Thanks a lot for your MP! It really works for me!

  3. Anonymous says:

    Thank you!  This has been driving me nuts since we implemented the DNS management pack.  All our Windows 2003 DCs with DNS were giving us these errors.  Without this fix the management pack is almost useless.  I understand Windows 2003 is old at this point but Microsoft should really fix the root cause of the issue since it makes the DNS Management pack almost useless.  At least provide a hotfix for companies using SCOM to deploy…

  4. Anonymous says:

    Here is the only problem I have with recompiling the MOF:  In my testing – recompiling the MOF is only necessary when the DNS WMI namespace is missing or corrupt.  If after bouncing the WMI service, you still cannot manually query any of the WMI objects, or cannot even connect to the WMI namespace, then I agree – recomplile the mof.  

    However – in my testing – I did all three – updated WMI, recompliled MOF, and then updated cscript.  After 1 month passed – the issue returned.  It seems to take a long amount of uptime for this random error condition to present itself.  That is why I am adding the additional WMI buffer space now.  This supposedly will address the issue for most people.  I will let you know in a month or two.  :-)

  5. Anonymous says:

    Has anyone been able to try modifying the monitor that Mark mentionned in his post (02/19/2010). I’m stuck with the same issue here but it only happens on our production DCs and can’t really try this out…

    Francis

  6. Anonymous says:

    Well,

    You are right, Kevin. Failure of query 'Select Name, Shutdown, Paused from MicrosoftDNS_Zone' depends on position of stars…

    After rebooting server problem goes away for a couple of weeks. But after that time it happens again. All recommended updates to WMI and OS are installed. Does anyone have issue like this on Windows 2008 DNS servers?

    P.S. I will try to ask MS techsupport for this problem.

  7. Anonymous says:

    Alex – this is still Server 2003 then – and has the leak in WMI.  You MUST use something like my MP to keep the provider from unloading, or you will be affected.  This is a textbook example.  You can hotfix and patch and tweak to your hearts content – you will not solve the root cause.  The root cause is that when the DNS WMI provider unloads after a period of inactivity – it leaks a TLS slot.  If you use my example MP, this will qury the provider enough to keep it from unloading, and you will work around the issue in the Windows WMI provider.

  8. Anonymous says:

    Alex – are you getting this on 2003 servers or 2008 or 2008R2?

    There is no TLS leak on the 2008 DNS WMI provider…. this specific issue should impact 2003 servers only…. unless you are hitting something else.

    Are your WMIPRVSE processes using a lot of private bytes (look in task manager)

    Are you sure you set up a rule or used my MP to query the WMI provider every 4 minutes to keep it from unloading?  That is the fix…. for 2003 servers at least.

  9. Anonymous says:

    Mark – that is a really go idea…. as long as that monitor queries the WMI namespace.

    The only concern I would have is the "expense" of that monitor… if it uses a lot of CPU when it runs it might have a tad more impact to the server… but overall I like it!

  10. Anonymous says:

    Yep… totally agree.  I am seeing the same now.

    The current theory, is that something is wrong with the WMI provider for DNS…. this isnt related to SCOM.  That issue is – that this DNS WMI provider leaks a TLS slot and when they are exhausted (takes about 2-3 weeks for me) then the problem occurs…. When this happens, you can bounce WMI/Reboot the computer, and the problem goes away for 2-3 weeks.

    The provider unloads after 5 minutes of inactivity.  If you did something – say…. run a times script that does something VERY lightweight, like runs the simple WMI query and nothing else – against the DNS WMI namespace, and run this script every 2 minutes.  This would keep the provider from loading/unloading as caused by the SCOM MP…. and the TLS slots will not leak because the DNS provider is not unloading.  I was also thinking of maybe writing a threshold monitor – against a WMI perf object, that wont change/alert… and this might keep the provider loaded and have even less impact.

    That is a theory, I have not had time to test and validate this.

  11. Anonymous says:

    1.  There are several reasons, but at this time – Microsoft does not have plans to fix this specific TLS slot leak from the DNS WMI namespace provider.  Since this only affects Windows 2003 in rare cases where something queries the DNS WMI namespace on frequent cycle, but less often than 5 minutes, the workarounds resolve the issue.  SImply increase the frequency.

    2.  No – you do not need to Migrate to 2008 to resolve this – workarounds have been provided.  However, there are many benefits to migrating to Windows 2008 R2 so that is always recommended.

    Suggested workaround is always applicable.  If you dont have MOM or SCOM or some other monitoring tool constantly querying the WMI DNS namespace provider – then the issue WILL NOT surface.  If you feel for some reason you are impacted without a monitoring tool in place – you could still create your own solution for this using task scheduler.

  12. Anonymous says:

    Kevin,

    All my DNS servers are Windows 2003 R2 and I get this error from all this servers. I have 3-6 WMIPRVSE processes with 5-20 MBs of memory. I have upgraded SCOM to R2 version adn will try your MP on this version too – on SCOM 2007 it has no effect.

  13. Anonymous says:

    Hi, everyone!

    A have the same issue – monitoring DNS in SCOM always failed! Query 'Select Name, Shutdown, Paused from MicrosoftDNS_Zone' always return 0x80041001 error. I have tried to apply various updates and hotfixes, create custom MP for DNS according to Kevin's post – nothing helps!

    In http://www.activexperts.com/…/server I have found, that WMI query to DNS-server in SCOM has bug – query should be like this 'Select Name, Paused, Shutdown from MicrosoftDNS_Zone'.This query returns proper result!

    Does anyone know, how to change this query in SCOM??

  14. Anonymous says:

    1) is this issue being  picked up by MS , and will there be a  update/fix for it   ?.  

    2)  Do we need to migrate to another platform eg 2008 to resolve this issue ?

    suggested mp workaround isnt applyable in some cases. eg , no mom or scom, i hope MS takes action on this issue.

  15. Marnix Wolf says:

    Hello Kevin.

    Many times the solutions you describe work perfectly. However, on many occassions I find that for the DNS MP the updates of WMI and Windows Scripting Host is not sufficient. The DNS class in WMI has to be recompiled as well.

    Then all errors are gone and everything runs like clockwork again.

    So on DNS servers I start three actions:

    – Updating WMI

    – Updating Windows Scripting Host

    – Recompiling DNS class in WMI

  16. Marnix Wolf says:

    Oops. So even with recompiling the MOF the issue returns…

    Good to know about adding additional WMI buffer space. If that solves this problem also on the long term it is good to know.

    Thanks again for sharing such good information with the community.

  17. number33 says:

    Hi Kevin,

    I’m afraid it does not help.

    We recompiled the DNS mof, installed KB933061, increased teh buffer space, finally rebooted the systems.

    It was a relief for some time, but the errors reappeared after some time.

    I’m afraid there is something wrong with either WMI or DNS mof or both ?

  18. Mark Carroll says:

    I found an existing monitor that appears to do the same thing.  Under the class DNS 2003 server there is a configuration monitor named "DNS 2003 Event Logging Level Monitor".  I created an override to change the Interval from 900 to 240.  Hope this helps.

  19. gkyildirim says:

    Hi,

    I am not sure whether this is the best place for my problem. We have DNS memory leak in a dc (wk3/sp2/x86) that occurs about every two weeks. It started after we have deployed SCOM R2 agents. I wonder whether it might be related this topic.

    Thanks,

  20. Bryan C. says:

    I implemented the monitor described in the blog post.  Seems I've traded one unwanted alert for another.  Now I get this alert:

    Generic Performance Mapper Module Failed Execution

    Module was unable to convert parameter to a double value

    Original parameter: '$Data/Property[@Name='EventLogLevel']$'

    Parameter after $Data replacement: ''

    Error: 0x80020005

    Details: Type mismatch.

    One or more workflows were affected by this.

    Workflow name: UIGeneratedMonitor77a23b731c894ea182b44113e8f657de

    Instance name: host.domain.com

    Instance ID: {9C068BE3-4F78-A2B0-F224-1F9F12C9B424}

    Management group: MgmtGroupName

    The workflow name was validated to map back to the custom DNS monitor.

    Any suggestions?

  21. Neil McLoughlin says:

    This issue has also been drivign me nuts for weeks aswell! Thanks for the work around, seems to work perfectly

  22. Nicole Welch says:

    I'm getting the same error as Bryan on about 10% of my servers…  

    Event Type: Warning

    Event Source: Health Service Modules

    Event Category: None

    Event ID: 11052

    Date: 9/11/2012

    Time: 3:18:12 PM

    User: N/A

    Computer: []

    Description:

    Module was unable to convert parameter to a double value

    Original parameter: '$Data/Property[@Name='EventLogLevel']$'

    Parameter after $Data replacement: ''

    Error: 0x80020005

    Details: Type mismatch.

    One or more workflows were affected by this.  

    Workflow name: UIGeneratedMonitor1dcc5008a0b240b4bc900ca8fea84206

    Instance name: []

    Instance ID: {1B2459E9-E4CC-5F8E-4C5C-1582DDC253C4}

    Management group: CCC_OpsMgr

    Any idea why this would arise?  My event logging level is to to NULL on these machines.

  23. Scott Breen says:

    Thanks Kevin. Those alerts were really starting to annoy me. I never would have found the solution without this article. Yet to see if it solves all servers, but I'm hopeful!

  24. We have had a ton of WMI is unhealthy alerts in the last month. To validate I test WMI on the machine and in almost all cases a simple query returns quota violation. It is not limited to a specific version of Windows Server (range from 2003 to 2012 R2).
    Restarting WMI resolves the alert. Now specific to this post, I am getting the DNS – WMI Validation Failed alerts on a domain controller but it is 2008 R2.