Cluster-Aware Updating (CAU) interaction with Proxy Servers

Welcome back to the CORE Team blog. Cluster-Aware Updating (CAU) is an automated feature that allows you to update clustered servers with little or no loss in availability during the update process. Cluster updates are obtained using one of three methods:

  1. Connecting to the internet and downloading patches from Windows Update\Microsoft Update
  2. Connecting to an internal WSUS server and downloading approved updates
  3. Downloading hotfixes from the internet, placing those fixes on an internal file server share, and then using the CAU hotfix-plugin to patch a cluster

During an Updating Run, CAU transparently performs the following tasks:

  • Place the node being updated into maintenance mode
  • Move all cluster roles off the node being updated (Virtual Machine roles are live migrated)
  • Install updates and all dependent updates
  • Restart the node if necessary during the patching process
  • Bring the updated node out of maintenance mode
  • Restore the cluster roles to the updated node
  • Continue updating the remaining nodes in the cluster using the same steps
For more information on Cluster-Aware Updating (CAU), review the following TechNet information - https://technet.microsoft.com/en-us/library/hh831694.aspx

If a Proxy server is required to gain access to the internet, then CAU must be configured to use it. CAU is a system-level process and cannot\will not use a user-mode proxy server configuration. A user-mode configuration is like that one configured in Internet Explorer. The setting is manually configured by the user, or implemented via a Group Policy Object (GPO) in Active Directory. To configure a system-level proxy server, use the netsh command line.

netsh winhttp set proxy myproxy.fabrikam.com:80 "<local>"

The above command configures a system-level proxy server using port 80 and sets the 'minimal' exceptions for local addresses. While this would appear to be sufficient, there is an unfortunate side effect of this configuration if the Failover Cluster is supporting highly available File Servers. With a configuration similar to the above, a user will not be able to add a file share to the HA File Server role. The process will appear to start normally, but it will terminate unexpectedly.

clip_image002

There will be no errors registered by the cluster service in the system event log. However, there will be several errors registered in the Windows Remote Management log. The first error is

Event ID: 137
Source: Windows Remote Management
Network layer returned ERROR_WINHTTP_NAME_NOT_RESOLVED - The server name cannot be resolved. Aborting the operation.

This is followed by another error -

Event ID: 49
Source: Windows Remote Management
The WinRM protocol operation failed due to the following error: The WinRM client cannot process the request because the server name cannot be resolved.

The final error recorded -

Event ID: 142
Source: Windows Remote Management
WSMan operation Enumeration failed, error code 2150859193

Decoding the error code -

clip_image004

The 'server name' in question is the name of the Client Access Point (CAP) (in my test the CAP NetBIOS name was Test-FS) in the File Server Group where the share is being created. Looking in the cluster log, we see -

000010c8.00001028::2012/11/19-17:24:39.348 INFO [RES] Network Name <Test-FS>: Netbios: Slow Operation, FinishWithReply: 0

000010c8.00001028::2012/11/19-17:24:39.348 INFO [RES] Network Name: [NN] got sync reply: 0

000010c8.00001028::2012/11/19-17:24:39.348 INFO [RES] Network Name <Test-FS>: Netbios: End of Slow Operation, state: Initialized/Idle, prevWorkState: Idle

000010c8.00001028::2012/11/19-17:24:39.348 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:d524be11-4b9a-4e1e-855d-9227ea61988d:Netbios

000010c8.00000380::2012/11/19-17:24:39.348 INFO [RES] Network Name <Test-FS>: Netbios: Slow Operation, FinishWithReply: 0

000010c8.00000380::2012/11/19-17:24:39.348 INFO [RES] Network Name: [NN] got sync reply: 0

000010c8.00000380::2012/11/19-17:24:39.348 INFO [RES] Network Name <Test-FS>: Netbios: End of Slow Operation, state: Initialized/Idle, prevWorkState: Idle

000010c8.00000380::2012/11/19-17:24:44.348 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:d524be11-4b9a-4e1e-855d-9227ea61988d:Netbios

000010c8.00001028::2012/11/19-17:24:44.348 INFO [RES] Network Name <Test-FS>: Netbios: Slow Operation, FinishWithReply: 0

000010c8.00001028::2012/11/19-17:24:44.348 INFO [RES] Network Name: [NN] got sync reply: 0

000010c8.00001028::2012/11/19-17:24:44.348 INFO [RES] Network Name <Test-FS>: Netbios: End of Slow Operation, state: Initialized/Idle, prevWorkState: Idle

The solution is to modify the local address exceptions in the proxy server configuration as shown in this example -

netsh winhttp set proxy myproxy.fabrikam.com:80 "<local>;*.fabrikam.com"

We added the wildcard exception for the local domain (*.fabrikam.com). With this updated configuration in place, the share (test2) creation process completes normally with no errors registering in the Windows Remote Management log. Looking at the cluster log -

000010c8.00000200::2012/11/19-17:29:24.346 INFO [RES] Network Name <Test-FS>: Netbios: Slow Operation, FinishWithReply: 0

000010c8.00000200::2012/11/19-17:29:24.346 INFO [RES] Network Name: [NN] got sync reply: 0

000010c8.00000200::2012/11/19-17:29:24.346 INFO [RES] Network Name <Test-FS>: Netbios: End of Slow Operation, state: Initialized/Idle, prevWorkState: Idle

00000920.0000072c::2012/11/19-17:29:24.479 INFO [NM] Received request from client address FABRIKAM-N21.

000010e8.00000f10::2012/11/19-17:29:24.481 INFO [RES] Physical Disk <Cluster Disk 1>: Path Y:\Shares\test2 can be on the disk

000010c8.00001028::2012/11/19-17:29:24.483 INFO [RES] Network Name <Test-FS>: Getting Read/Write private properties

00000920.0000072c::2012/11/19-17:29:24.486 INFO [GEM] Sending 1 messages as a batched GEM message

00000920.0000072c::2012/11/19-17:29:24.486 INFO [GUM] Node 2: Processing RequestLock 2:149

00000920.000013a0::2012/11/19-17:29:24.487 INFO [GUM] Node 2: Processing GrantLock to 2 (sent by 3 gumid: 584)

00000920.0000072c::2012/11/19-17:29:24.487 INFO [GEM] Sending 1 messages as a batched GEM message

00000920.0000072c::2012/11/19-17:29:24.489 ERR [DM] Dm::DmBaseKey::SetValue: ERROR_ACCESS_DENIED(5)' because of 'status'(test2)

00000920.0000072c::2012/11/19-17:29:24.489 INFO [GEM] Sending 1 messages as a batched GEM message

000010c8.00000600::2012/11/19-17:29:24.490 INFO [RES] File Server <File Server (\\Test-FS)>: Created share test2

00000920.00000f74::2012/11/19-17:29:24.491 INFO [GEM] Sending 1 messages as a batched GEM message

000010c8.00000200::2012/11/19-17:29:24.527 INFO [RES] Network Name <Test-FS>: Getting Read/Write private properties

00000920.00000b34::2012/11/19-17:29:24.552 INFO [GEM] Sending 1 messages as a batched GEM message

000010c8.00000600::2012/11/19-17:29:24.553 INFO [RES] File Server <File Server (\\Test-FS)>: Updated share test2

Hope you found this information helpful.

Thanks, and come back again soon.

Chuck Timon
Senior Support Escalation Engineer
Microsoft Enterprise Platforms Support
High Availability\Virtualization Team