A tale of Event 422 on WAP servers

A tale from support. I hope this helps solve similar issues more quickly.

The Setup:
Two Active Directory Federation (AD FS) Servers running Windows 2012 R2, located on the corporate network.
Two Web Access Proxy (WAP) servers located in the DMZ.

The Story:

At first event 422 was logged here and there, but over the course of a couple of days, it began to be constant.

The error being logged was occurring on the WAP servers in the AD FS\Admin log.

 Log Name:     AD FS/Admin
Source:       AD FS
Event ID:     422
Task Category: None
Level:         Error
Keywords:     AD FS
Description:
Unable to retrieve proxy configuration data from the Federation Service.

Additional Data
Trust Certificate Thumbprint:
<snip>
Status Code:
Exception details:
 System.Net.WebException: The operation has timed out
 at System.Net.HttpWebRequest.GetResponse()
 at Microsoft.IdentityServer.Management.Proxy.StsConfigurationProvider.GetStsProxyConfiguration()

 

However, the WAPs were able to establish the trust to the AD FS server successfully:

 Log Name:     AD FS/Admin
Source:       AD FS
Event ID:     391
Task Category: None
Level:         Information
Keywords:     AD FS
Description:
The federation server proxy was able to successfully establish a trust with the Federation Service.

Eventually, the WAPs stopped servicing authentication requests to the AD FS servers.

The Hunt:

We took a network trace while restarting the AD FS service. We found that after the WAP connected to the AD FS server, the WAP was the last to send a TCP ACK and then there was no traffic on the connection. After 100 seconds exactly, the WAP sent a TCP FIN and closed the connection.

 

The customer mentioned that when starting the Device Registration Service the service took a long time to start. We investigated this angle and found that Device Registration was initialized (Initialize-ADDeviceRegistration had been run), but Device Registration was not actually being used.

When we ran Get-AdfsDeviceRegistration on the AD FS server, it took about 3 minutes to complete.

At this point, I’m thinking about how the WAP closes the connection to the ADFS servers after 100 seconds, but it Get-AdfsDeviceRegistration is taking at around 180 seconds.

We tried to update the DRS configuration via PowerShell on the WAP to isolate this process. Sure enough, the process failed.

 PS C:\> Update-WebApplicationProxyDeviceRegistration
Update-WebApplicationProxyDeviceRegistration : Unable to retrieve Device Registration Service configuration data from
the Federation Server.
At line:1 char:1
+ Update-WebApplicationProxyDeviceRegistration
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo         : NotSpecified: (:) [Update-WebAppli...iceRegistration], ConfigurationErrorsException
+ FullyQualifiedErrorId : System.Configuration.ConfigurationErrorsException,Microsoft.IdentityServer.Management.Pr
oxy.Commands.UpdateAdfsProxyDeviceRegistration

 

We continued to troubleshoot DRS and eventually came across the following hotfix:

3020773 Time-out failures after initial deployment of Device Registration service in Windows Server 2012 R2
https://support.microsoft.com/kb/3020773/EN-US

While the symptoms manifested in this case were quite different than what is documented in the hotfix, the symptoms were in line with “it takes a long time to find a valid key”. Taking a long time to find something would definitely result in an operation timing out. The hotfix sounded promising.

The Fix:

We needed to prepare the machines for the update:

Install this rollup first.

2919355 - [Windows 8.1 Update 1] Windows RT 8.1, Windows 8.1, and Windows Server 2012 R2 update rollup: April 2014   (https://support.microsoft.com/kb/2919355)

(If you get a "not applicable" error installing 2919355, install https://support.microsoft.com/en-us/kb/2919442 )

Install this rollup second.

3000850 November 2014 update rollup for Windows RT 8.1, Windows 8.1, and Windows Server 2012 R2 (https://support.microsoft.com/kb/3000850)  

Finally, install the DRS issue hotfix.

3020773 Time-out failures after initial deployment of Device Registration service in Windows Server 2012 R2
https://support.microsoft.com/kb/3020773/EN-US

After installing the updates on the AD FS and WAP servers and rebooting all the machines, the issue was resolved.