Intermittent Connectivity Issues and Cluster Refresh Times with Windows 2008 R2 and SCVMM 2008 R2

star angle

It’s not often I get excited about Hotfixes… ok, that may not be entirely true, but I am very excited about these two. Why? They address WMI issues.

‘Why, Jonathan, would I care about WMI? That’s one of those obscure process things that I’ve never even seen…’, you say.

‘You’re right!’ I say, over-enthusiastically. ‘But WMI is the basis for most of the actions SCVMM takes on Hosts, whether it be refreshes or migrations.’

‘Wow! That *is* exciting!’, you say… mockingly.

You get the point… Direct link to the updates at bottom. Read on for why this matters, and when it matters. (Hint: Always!)

I came across these updates when working a customer case that involved SCVMM Hosts with SCOM (OpsMgr) Agents installed as well. Bottom line is the WMI service was taking a pounding and crashing occasionally. No WMI, no communication with SCVMM, and SCOM throws alerts. The WMI service will restart itself, but in the meantime anything depending on it is out of luck.

WMI or WinRM?

You can’t discuss SCVMM without mentioning WinRM at least once… Prior to hotfixes being installed, WinRM settings that are known to ease intermittent connectivity issues were implemented. While there is nothing inherently wrong with making these changes, install the hotfixes below first and see if the issue is resolved first. The less registry tweaking the better.

Specifically, we saw Errors and hex codes in refresh jobs. The details may state WinRM is experiencing issues, but it is only reporting back results on the last action it performed. WMI errors bubble up through WinRM reporting often, so WMI was the real source in some instances.

WinRM Issues

Either of the below messages indicate there is likely a WinRM issue to address. In this case, follow the steps to update the registry on all Hosts. Make sure you install the updates further down as well!

  • No more threads can be created in the system (0x800700A4)

    Error (2912)
    An internal error has occurred trying to contact an agent on the server.contoso.com server.
    (No more threads can be created in the system (0x800700A4))

    Recommended Action
    Ensure the agent is installed and running. Ensure the WS-Management service is installed and running, then restart the agent.

  • The WS-Management service cannot process the request. The maximum number of concurrent operations for this user has been exceeded. Close existing operations for this user, or raise the quota for this user (0x803381A6)

WinRM Resolution Steps

MaxConcurrentOperationsPerUser (Increase threads)

In Windows Server 2008 R2, the MaxConcurrentOperations for WinRM was deprecated.  However it was replaced with MaxConcurrentOperationsPerUser, which we can modify. Let also raise the value for MaxConcurrentOperationsPerUser using the following command. The default is '200' and we will double that to '400'.

Open an elevated command prompt on each Host to execute this and all commands below.

    winrm set winrm/config/Service @{MaxConcurrentOperationsPerUser="400"}

Start with the above value. This can be increased over the next week or so if the issue reoccurs. Once this value has been set, you’ll need to stop/restart WinRM and the VMMAgent.

    net stop winrm
net start winrm
net start vmmagent

WinRM Timeout
Use the following command to increase the default value for WinRM timeout.

    winrm set winrm/config @{MaxTimeoutms = "1800000"}

Start with the above value, and then we can increase it over the next week or so if the issue reoccurs.  Once this value has been set, you’ll need to stop/restart WinRM and the VMMAgent.

    net stop winrm
net start winrm
net start vmmagent

Reference
    Installation and Configuration for Windows Remote Management
    https://msdn.microsoft.com/en-us/library/aa384372(VS.85).aspx

WinRM or WMI Issues

Install all previous updates and the two new ones below as indicated for your environment.

Error (2916)
VMM is unable to complete the request. The connection to the agent server.contoso.com was lost.
(Unknown error (0x80338012))

Recommended Action
Ensure that the WS-Management service and the agent are installed and running and that a firewall is not blocking HTTP traffic. If the error persists, reboot server.contoso.com and then try the operation again.

Error (2927)
A Hardware Management error has occurred trying to contact server server.contoso.com.
(Unknown error (0x803381b9))

Recommended Action
Check that WinRM is installed and running on server server.contoso.com. For more information use the command "winrm helpmsg hresult".

What Do the Hotfixes Address?

1. This hotfix specifically addresses this issue (found in the Application log), along with general memory leaks.

The "Win32_Service" WMI class leaks memory in Windows Server 2008 R2 and in Windows 7
https://support.microsoft.com/kb/981314

Example Application log error:

Faulting application name: wmiprvse.exe, version: 6.1.7600.16385

Symptoms

Wmiprvse.exe crashing randomly on Windows 2008 R2 server.

This continues even after increasing memory quota for wmiprvse.exe

Event:

Source:        Application Error

Event ID:      1000

Description:

Faulting application name: wmiprvse.exe, version: 6.1.7600.16385, time stamp: 0x4a5bc794

2. This hotfix addresses SCVMM Failover Cluster refresh times, among other things.

An application or service that queries information about a failover cluster by using the WMI provider may experience low performance or a time-out exception
https://support.microsoft.com/?id=974930

Get your updates!

  • Install on all Windows 2008 R2 systems managed by SCVMM 2008 R2

The "Win32_Service" WMI class leaks memory in Windows Server 2008 R2 and in Windows 7
https://support.microsoft.com/kb/981314

  • Install on all Failover Cluster Nodes

An application or service that queries information about a failover cluster by using the WMI provider may experience low performance or a time-out exception
https://support.microsoft.com/?id=974930