High Processor Utilization by wspsrv.exe process on TMG 2010

1. Introduction

 

Troubleshooting high CPU utilization can be challenging, even when you know the process that is causing such behavior, which in this case is the wspsrv.exe. In order to better understand what’s going on in scenarios like this is recommended to get perfmon snapshot, including the time that the process is working normally, until the issue happens. It is also recommended to get a user mode dump of the wspsrv.exe process; to do that read this article here. This post is based on a scenario where wspsrv.exe process was consuming lots of CPU resource during business hours. The issue did not happen out of the business hours.

 

2. Troubleshooting

 

In this case perfmon was very decisive because it was possible to see the normal behavior (wspsrv.exe using around 20% average) and then after starting business hours the CPU utilization by wspsrv.exe process grows consistently. However, having lots of perfmon counters without knowing exactly what to start looking at is not very useful, it was the user mode dump of the wspsrv.exe process that triggered what to look for. For this particular scenario we had around 1000 threads running within wspsrv.exe process and around 200 threads had the same pattern as shown below:

 

0:000> ~50 kv

*** ERROR: Symbol file could not be found. Defaulted to export symbols for mpengine.dll -

Child-SP RetAddr : Args to Child : Call Site

00000000`1ea0e520 00000000`70433461 : 00000000`889f7b60 00000000`889f7b50 00000000`889f7bb0 00000000`889f78d8 : 0x1e774914

00000000`1ea0e570 00000000`70432f68 : 00000000`889f7828 00000000`889f7828 00000000`889f7bb0 00000000`7046ba90 : mpengine+0x13461

00000000`1ea0e630 00000000`704cec70 : 00000000`889b0040 00000000`889f7bb0 00000000`00000012 00000000`889f78d8 : mpengine+0x12f68

00000000`1ea0e6e0 00000000`704cdb4b : 00000000`000000fa 00000000`00000001 00000000`00000009 00000000`00000000 : mpengine+0xaec70

00000000`1ea0e8e0 00000000`7043ddd0 : 00000000`16d285f0 00000000`16d285f0 00000000`00000000 00000000`16d285f0 : mpengine+0xadb4b

00000000`1ea0ea00 00000000`7043e447 : 00000000`16d28110 00000000`16d285f0 00000000`00000000 00000000`00000000 : mpengine+0x1ddd0

00000000`1ea0ebf0 00000000`7043e1bf : 00000001`804fbf93 00000000`00000000 00000000`09180492 00000000`00000001 : mpengine+0x1e447

00000000`1ea0ec90 00000000`704adfb6 : 00000000`16d28100 00000000`16c66f01 00000000`00000000 00000000`00000000 : mpengine+0x1e1bf

00000000`1ea0ed20 00000000`7057b1b1 : 00000000`183c0e00 00000000`7068aa80 00000000`00000000 00000000`16d28110 : mpengine+0x8dfb6

00000000`1ea0edb0 00000000`7057b5e3 : 00000000`21825080 00000000`7068aa80 00000000`00000001 00000000`183c0e00 : mpengine+0x15b1b1

00000000`1ea0ee10 00000000`7057b7ba : 00000000`21825080 00000000`1ea0ef90 00000000`21825080 00000000`21825080 : mpengine+0x15b5e3

00000000`1ea0eed0 00000000`7057b6ab : 00000000`00000000 00000000`0000800c 00000000`00000001 00000000`21825080 : mpengine+0x15b7ba

00000000`1ea0ef20 00000000`70686365 : 00000000`00000000 00000000`00000020 00000000`5d69f640 00000000`5d69f640 : mpengine+0x15b6ab

00000000`1ea0ef50 00000000`7057da92 : 00000000`00000020 00000000`00004032 00000000`21825080 000007fe`f1a029e4 : mpengine!GetSigFiles+0x5b035

00000000`1ea0f080 00000000`7057f4d6 : 00000000`569def80 00000000`00000001 00000000`1d730548 00000000`76ecccd4 : mpengine+0x15da92

00000000`1ea0f230 00000000`7057f605 : 00000000`569def80 00000000`1ea0f440 00000000`00004032 00000000`76fb3660 : mpengine+0x15f4d6

*** ERROR: Symbol file could not be found. Defaulted to export symbols for EmpScan.dll -

00000000`1ea0f260 000007fe`f1a0d2d9 : 00000000`00000000 00000000`712c32c3 00000000`0195da28 00000000`76e901f2 : mpengine!rsignal+0x105

00000000`1ea0f390 000007fe`f1a0c668 : 00000000`569decd0 00000000`00004032 00000000`1ea0f440 00000000`00000020 : EmpScan!SetProxyStatsObject+0x9289

00000000`1ea0f400 000007fe`f1a104ff : 00000000`569decd0 00000000`6c8eeb68 00000000`6c8eeb70 000007fe`f19f8bc8 : EmpScan!SetProxyStatsObject+0x8618

00000000`1ea0f550 000007fe`f1a0904f : 00000000`6c8eeb70 00000000`0385b5f0 00000000`60e43070 00000000`000003b8 : EmpScan!SetProxyStatsObject+0xc4af

EMPScan and MPEngine are part of Malware Inspection as explained in this KB. This raises a flag that the good amount of CPU utilization by wspsr.exe process was caused by those 200 threads that were using functions that belongs to Malware Inspection. Having those 200 threads were already suspicious, but it could be that server was just too busy during that time performing malware inspection. In order to move I verified if there was any thread locked in Critical Section. To do that the following command was used !cs -l and the result was:

 

-----------------------------------------

DebugInfo = 0x0000000000343980

Critical section = 0x000000000032b500 (+0x32B500)

LOCKED

LockCount = 0x0

WaiterWoken = No

OwningThread = 0x0000000000000b28

RecursionCount = 0x1

LockSemaphore = 0x0

SpinCount = 0x0000000000000000

-----------------------------------------

DebugInfo = 0x00000000324861f0

Critical section = 0x00000000648f12d0 (+0x648F12D0)

LOCKED

LockCount = 0x0

WaiterWoken = No

OwningThread = 0x0000000000000b1c

RecursionCount = 0x1

LockSemaphore = 0x0

SpinCount = 0x0000000000000fa0

-----------------------------------------

DebugInfo = 0x00000000328a9200

Critical section = 0x000000005f1b8240 (+0x5F1B8240)

LOCKED

LockCount = 0x0

WaiterWoken = No

OwningThread = 0x00000000000011d0

RecursionCount = 0x1

LockSemaphore = 0x0

SpinCount = 0x0000000000000fa0

-----------------------------------------

 

….and many more…

 

Now that we know the threads that are locked in Critical Section, we just need to get more details about those threads by using the command !cs –o <owning thread> , in this case the relevant thread was:

 

0:000> !cs -o 0x000000005f1d9460

-----------------------------------------

Critical section = 0x000000005f1d9460 (+0x5F1D9460)

DebugInfo = 0x00000000565fd4f0

LOCKED

LockCount = 0x0

WaiterWoken = No

OwningThread = 0x000000000000194c

RecursionCount = 0x5

LockSemaphore = 0x43E80

SpinCount = 0x0000000000000000

OwningThread DbgId = ~889s

OwningThread Stack =

 

00000000`78fde658 000007fe`f0c0a6c9 : 00000000`566fa638 00000000`78fde6c4 00000000`566fa638 00000000`00000000 : EmpFilter!DllUnregisterServer+0x1c6b7

00000000`78fde660 000007fe`f0c22a52 : 00000000`566fa630 00000000`31eb9120 000007fe`f0bc9198 00000000`4a74d9d0 : EmpFilter!DllUnregisterServer+0x20431

00000000`78fde7b0 000007fe`f0c15f53 : 00000000`31eb9100 00000000`566fa6fc 00000000`4a74d9d0 00000000`00000000 : EmpFilter!DllUnregisterServer+0x387ba

00000000`78fde830 000007fe`f0c0c11b : 00000000`31eb9100 00000000`566fa6fc 00000000`4a74d9d0 00000000`4a74d9d0 : EmpFilter!DllUnregisterServer+0x2bcbb

00000000`78fde8a0 000007fe`f0c172ee : 00000000`566fa630 00000000`00000000 00000000`566fa648 00000000`78fde960 : EmpFilter!DllUnregisterServer+0x21e83

00000000`78fde920 000007fe`f0c0d7d4 : 00000000`566fa638 00000000`00000000 00000000`566fa740 00000000`00001355 : EmpFilter!DllUnregisterServer+0x2d056

00000000`78fde9a0 000007fe`f0c12c67 : 00000000`566fa630 00000000`618a3ba0 00000000`00001355 00000000`4a74d6b0 : EmpFilter!DllUnregisterServer+0x2353c

00000000`78fdeb20 000007fe`f0c1afad : 00000000`566fa630 00000000`618a3ba0 00000000`00001355 00000000`00000000 : EmpFilter!DllUnregisterServer+0x289cf

00000000`78fdebd0 000007fe`f0be40ee : 00000000`6cfd8640 00000000`30f6eaf0 000007fe`f0bca490 000007fe`f0bd133f : EmpFilter!DllUnregisterServer+0x30d15

00000000`78fdec60 000007fe`f0c3270b : 00000000`1af7a7b0 00000000`56cdf680 00000000`30f6eaf0 00000000`78fded64 : EmpFilter+0x240ee

00000000`78fded40 00000000`72ebf77b : 000007fe`f0c5eb18 00000000`56cdf680 00000000`00000400 00000000`30f6eaf0 : EmpFilter!DllUnregisterServer+0x48473

00000000`78fdeda0 00000000`72ed316d : 00000000`56cdf620 00000000`618a3ba0 00000000`00001355 00000000`00001355 : W3Filter!DllInstall+0x2419f

00000000`78fdeec0 00000000`72eabe84 : 00000000`562b4480 00000000`618a3ba0 00000000`00001355 00000000`78fdefe0 : W3Filter!DllInstall+0x37b91

00000000`78fdef30 00000000`72efcacd : 00000000`562b44b8 00000000`1ad135e0 00000000`618a3ba0 00000000`00001355 : W3Filter!DllInstall+0x108a8

00000000`78fdefd0 00000000`72f031bc : 00000000`27109010 00000000`00000001 00000000`000003e5 00000000`00000000 : W3Filter!DllInstall+0x614f1

00000000`78fdf040 00000000`72f038cb : 00000000`27109010 00000000`00000000 00000000`00000000 00000000`72e5d9d0 : W3Filter!DllInstall+0x67be0

00000000`78fdf140 00000000`72f04420 : 00000000`27109010 00000000`27109030 00000000`00000000 00000000`565da0e0 : W3Filter!DllInstall+0x682ef

00000000`78fdf180 00000000`72ede14f : 00000000`27109030 00000000`00001355 00000000`00000000 00000000`00000000 : W3Filter!DllInstall+0x68e44

00000000`78fdf1c0 00000001`3fcc016e : 00000000`4a5c5e80 00000000`00000001 00000000`00000000 00000000`565da0e0 : W3Filter!DllInstall+0x42b73

00000000`78fdf4a0 00000001`3fcbf557 : 00000000`3235bd50 00000000`00001355 00000000`00000000 00000000`00000001 : wspsrv!IsChainingRequired+0x582d2

 

3. Resolution

 

The resolution for this problem is documented on this article here at TMG Team Blog.

 

4. Another Approach to Troubleshoot Issues of this Nature

 

Another approach to troubleshoot high CPU utilization is by leveraging the perfmon capabilities to identify the threads that are using more resources within the process. For example, in this scenario the perfmon was showing the following result:

 

image 

 

Here you have two key counters within thread object

· %Processor Time: specify how much processor this thread is using.

· ID Thread: represents the ID of the thread that you can use to correlate that on the dump for this process.

 

Using the recommendations from KB133385 we have:

 

1. Monitor each thread's %Processor Time and identify the instance number in Performance Monitor for the thread that takes the most processor time.

2. Find the ID Thread item that has the same instance number as the thread you identified in step 1 and select this ID Thread with the mouse. Note that its Last and Average status indicators have the same value. This value is the decimal thread ID.

3. Convert the decimal thread ID to a hexadecimal (hex) number to get the thread ID for desired component.

 

With the HEX value of the thread ID in hands you can go back to the dump file and look which thread is that and what this thread is doing.

 

Note: there is a really good step by step on Marcelo’s blog. Also, make sure to read the Chapter 13 of the Windows NT Resource Kit (old buddy that still rocking).