High CPU/High Memory in WSUS following Update Tuesdays

Updated 10/11/2017 – updated hotfix information.

Recently, we’ve seen an increase in the number of high CPU/High Memory usage problems with WSUS, including WSUS in a System Center Configuration Manager environment – these have mostly corresponded with Update Tuesdays.

Microsoft support has determined that the issue is driven primarily by the Windows 10 1607 updates, for example KB4022723, KB4022715, KB4025339, etc. See here for the list of Windows 10 1607 updates.

These updates have large metadata payloads for the dependent (child) packages because they roll up a large number of binaries. Windows 10, versions 1507 (Windows 10 RTM) and 1511 updates can also cause this, though to a lesser extent.  Windows 10, version 1703 is still recent enough that the metadata is not that large yet (but will continue to grow).

Symptom

The symptoms include

  • High CPU on your WSUS server – 70-100% CPU in w3wp.exe hosting WsusPool
  • High memory in the w3wp.exe process hosting the WsusPool – customers have reported memory usage approach 24GB
  • Constant recycling of the W3wp.exe hosting the WsusPool (identifiable by the PID changing)
  • Clients failing to scan with 8024401c (timeout) errors in the WindowsUpdate.log
  • Mostly 500 errors for the /ClientWebService/Client.asmx requests in the IIS logs

Cause

Microsoft support has determined that the issue is driven primarily by the Windows 10 1607 updates, for example KB4022723, KB4022715, KB4025339, etc. See here for the list of Windows 10 1607 updates.

These updates have large metadata payloads for the dependent (child) packages because they roll up a large number of binaries. Windows 10, versions 1507 (Windows 10 RTM) and 1511 updates can also cause this, though to a lesser extent. Windows 10, version 1703 is still recent enough that the metadata is not that large yet (but will continue to grow).

How to determine if the 1607 Updates are the cause

To determine if WSUS is affected by this problem, decline the Windows 10 updates (including the latest cumulative update). If both CPU and memory quickly drop back to normal, then the issue is likely the result of metadata size from the Windows 10 updates. They can be reapproved after you have determined if the updates are causing this issue, assuming you want to deploy them.

If declining the Windows 10 updates does not help, then the problem may be due to too many superseded updates in the WSUS server. Take the steps outlined in The Complete Guide to Microsoft WSUS and Configuration Manager SUP maintenance to decline the superseded updates. If, after doing this you are still having problems, read on.

This blog post may help alleviate some of these problems, but is not a magic bullet. After these changes are made, you will still see high CPU and memory until the system stabilizes as I explain further down.

WSUS Caching

WSUS has a caching mechanism whereby the first-time update metadata is requested by any client WSUS will store it in memory. Further requests for the same update revision will retrieve the update metadata from memory instead of reading it from the database. Some of the metadata in the database is compressed, so not only must it be retrieved, it must be decompressed into memory, which is an expensive operation.

You can monitor the current number of updates stored in the cache via Performance Monitor with the counter WSUS: Client Web Service/Cache size and instance spgetcorexml. Keep in mind that this counter provides the number of cached items, not the amount of memory consumed by cached metadata. w3wp.exe process memory can be used as a proxy for the amount of space consumed by the metadata cache.

The Problem

For large metadata packages and many simultaneous requests, it can take longer than ASP.NET’s default timeout of 110 seconds to retrieve all of the metadata the client needs. When the timeout is hit, ASP.NET disconnects the client and aborts the thread doing the metadata retrieval. If you look at Program Files\Update Services\LogFiles\SoftwareDistribution.log, the abort looks like this:

System.Threading.ThreadAbortException: Thread was being aborted.
   at System.Buffer.__Memcpy(Byte* dest, Byte* src, Int32 len) 
   at System.Buffer._Memcpy(Byte* dest, Byte* src, Int32 len)  
   at System.Buffer.Memcpy(Byte* dest, Byte* src, Int32 len)  
   at System.String.CtorCharPtrStartLength(Char* ptr, Int32 startIndex, Int32 length)    
   at Microsoft.UpdateServices.Internal.CabUtilities.ExpandMemoryCabToString(Byte[] src)   
   at Microsoft.UpdateServices.Internal.DataAccess.ExecuteSpGetCoreUpdateXml(Int32[] revisionIds) 
   at Microsoft.UpdateServices.Internal.DataAccessCache.GetCoreUpdateXml(Int32[] revisionIds, DataAccess da, Int64 maxXmlPerRequest)
   at Microsoft.UpdateServices.Internal.ClientImplementation.GetSyncInfo(Version clientProtocolVersion, DataAccess dataAccess, Hashtable stateTable, Hashtable deploymentTable, Boolean haveGroupsChanged, Boolean driverSyncNeeded, Boolean doChunking)
   at Microsoft.UpdateServices.Internal.ClientImplementation.SoftwareSync(DataAccess dataAccess, UnencryptedCookieData cookieData, Int32[] installedNonLeafUpdateIds, Int32[] leafUpdateIds, Boolean haveGroupsChanged, Boolean expressQuery, Guid[] filterCategoryIds, Boolean needTwoGroupOutOfScopeUpdates)
   at Microsoft.UpdateServices.Internal.ClientImplementation.SyncUpdates(Cookie cookie, SyncUpdateParameters parameters)
   at Microsoft.UpdateServices.Internal.ClientImplementation.SyncUpdates(Cookie cookie, SyncUpdateParameters parameters)

Note: What you are looking for is a ThreadAbortException with ExecuteSpGetCoreUpdateXml on the stack (ThreadAbortExceptions could happen for other reasons as well – we are concerned with this specific scenario).

When the thread abort happens, all of the metadata that has been retrieved to that point is discarded and is not cached. As a result, WSUS enters a continuous cycle where the data isn’t cached, the clients can never complete the scan and continue to rescan.

Another issue that can occur is the WSUS application pool keeps recycling because it exceeds the private memory threshold (which it is very likely to do if the limit is still the default of 1843200). This recycles the app pool, and thus the cached updates, and forces WSUS to go back through retrieving updates from the database and caching them.

Solution

A WSUS update is now available that includes improvements for update metadata processing. This update should be applied to all WSUS servers in your environment.

Windows Server 2016 (KB4039396)

Windows Server 2012 R2 (KB4041693)

Windows Server 2012 (KB4041690)

WSUS 3.0 SP2 (KB4039929)

In addition to applying the applicable update(s) noted above, it is recommended that routine maintenance of WSUS be performed. See The Complete Guide to Microsoft WSUS and Configuration Manager SUP maintenance for more info.

If you still occasionally experience thread abort exceptions, you can increase ASP.NET’s default timeout.

Increase the ASP.NET timeout

  • Make a copy of \Program Files\Update Services\WebServices\ClientWebService\Web.Config.
  • Open \Program Files\Update Services\WebServices\ClientWebService\Web.Config.
  • Find the element “<httpRunTime”. It will look like this (in an unmodified web.config):
<httpRuntime maxRequestLength="4096" />
  • Modify httpRunTime by adding an executionTimeout attribute:
<httpRuntime maxRequestLength="4096" executionTimeout="3600" />
  • Save the web.config to a different location and copy the modified one into the directory.
  • From an elevated command prompt, run IISReset to restart IIS.

Monitoring WSUS Metadata Caching

Open Windows Performance monitor and add the following counters

  • WSUS: Client Web Service | Cache Size counter for spgetcorexml instance.
  • Process | Private Memory counters.
    • If there is more than one w3wp.exe, add them all – the one with the highest memory usage is probably the WSUSPool, but you can also add Process | ID Process to determine which worker process should be monitored.

Monitor the cache size counter – it should increase and eventually reach a peak value that does not change. This indicates all metadata that clients need is cached. It can take several hours for this to stabilize, so be patient.