ISA Firewall Service Process (wspsrv.exe) high CPU utilization issue

1. Introduction

When dealing with ISA high CPU utilization where wspsrve.exe is the one consuming more resources, the first impression is that ISA is the culprit for that. There are some scenarios where this statement is true, such as this one that I documented last year. But there are other scenarios where this is not true, including scenarios where wspsrv.exe is crashing – not always is because of ISA itself. I tried to demystify this on this post that I wrote back in 2009.

 

This post is about a scenario where ISA Server was facing a 100% CPU utilization and wspsrv.exe process was consuming most part of that (around 93%). This was happening at least twice a day and the only way to get CPU get to a normal utilization was by restarting Firewall Service. Sounds like an ISA issue right? Not really and you will see.

 

2. Gathering Data

 

Use the same data gathering approach as the one listed in this post with an addition of getting a full memory dump using notmyfault tool, see KB972110 for more info.

 

3. Analyzing the Data

 

By reviewing perfmon data from the time that the issue was happen it was possible to see that wspsrv.exe was 88% average of CPU utilization, but notice that within this process there is no thread spending too much CPU cycles as shown below:

image

In order to better understand this issue we started to review the manual dump of wspsrv.exe process. By reviewing the user mode dump we were unable to determine what the root cause of the issue was. Most of the threads were waiting for something in Kernel mode as shown below:

 

> First step was to review the top threads that had calls in Kernel mode:

 

0:000> !runaway 7

 User Mode Time

... [hiding the values as they are not relevant here]

...

...

 Kernel Mode Time

  Thread Time

  69:1310 0 days 0:00:49.500

  53:129c 0 days 0:00:46.812

 104:113c 0 days 0:00:42.390

  45:127c 0 days 0:00:40.875

  78:688 0 days 0:00:40.734

  94:4d0 0 days 0:00:38.890

 162:18a8 0 days 0:00:38.500

 166:18c8 0 days 0:00:38.093

 108:1418 0 days 0:00:37.062

 

> By taking a closer look into the threads we see that some of them are either performing RPC or DNS such as the one below:

 

0:000> ~104kb

ChildEBP RetAddr Args to Child

WARNING: Stack unwind information not available. Following frames may be wrong.

28cada54 77c7feb0 28cada90 28cada74 77c80845 ntdll!KiFastSystemCallRet

28cada60 77c80845 28cada90 76ed42d8 28cade84 rpcrt4!I_RpcSendReceive+0x23

28cada74 77ce415a 28cadabc 24b86ff4 00000000 rpcrt4!NdrSendReceive+0x28

28cade64 76ed5049 76ed42d8 76ed421c 28cade84 rpcrt4!NdrClientCall2+0x1a8

28cade7c 76ed4f69 00000000 29e2f078 00000001 dnsapi!NetInfo_Copy+0x5c2

28caded8 76ed6f5d 29e2f078 00000001 00000000 dnsapi!NetInfo_Copy+0x4e2

28cadf0c 76ee9d0c 00000003 28cae1d8 00000001 dnsapi!DnsValidateName_W+0x31f

28cadf34 62ea2cc4 28cae1d8 00000001 00000000 dnsapi!DnsQuery_A+0x20

28cadf78 62ea30a2 28cae1d8 00000001 28cadfb8 msphlpr!ProxyDnsCacheInit+0x52f

28cadf98 62ea217c 28cae1d8 28cadfb8 62ea1f0f msphlpr!ProxyDnsCacheInit+0x90d

28cae2dc 64760c2a 021d7710 28caeedc 00000001 msphlpr!ProxyGetHostByName+0x26d

28cae764 64761d11 28caeed4 00000400 28caee74 W3Filter!DllUnregisterServer+0x42ca6

28cae9d4 647638dc 28caeed4 00000400 28caee74 W3Filter!DllUnregisterServer+0x43d8d

28caf534 647295c7 00000000 00000000 28cafe14 W3Filter!DllUnregisterServer+0x45958

28caf554 647262ef 021d7008 00000000 000003e5 W3Filter!DllUnregisterServer+0xb643

28caf574 647277f8 28cafe14 00000000 00000001 W3Filter!DllUnregisterServer+0x836b

28caf590 647281c5 28cafe14 28038ce0 28038728 W3Filter!DllUnregisterServer+0x9874

28cafe1c 64728ad0 000000a7 00000000 0000000c W3Filter!DllUnregisterServer+0xa241

28cafe90 0046051a 28038728 00000001 00000000 W3Filter!DllUnregisterServer+0xab4c

28cafefc 0046032d 000000a7 00000000 00000001 wspsrv!SSL_CREDENTIALS::operator=+0x2a870

28caff20 00460bea 2685d018 004602b1 28caff50 wspsrv!SSL_CREDENTIALS::operator=+0x2a683

28caff30 00460266 000000a7 00000000 00000001 wspsrv!SSL_CREDENTIALS::operator=+0x2af40

28caff50 0044b0ff 2685d0bc 000000a7 00000000 wspsrv!SSL_CREDENTIALS::operator=+0x2a5bc

28caff7c 0044c08e 2685d0bc 000000a7 00000000 wspsrv!SSL_CREDENTIALS::operator=+0x15455

28caffb8 77e6482f 0000004f 00000000 00000000 wspsrv!SSL_CREDENTIALS::operator=+0x163e4

28caffec 00000000 0044bf26 0000004f 00000000 kernel32!GetModuleHandleA+0xdf

 

It was time to review the kernel dump file and verify if we had more information there:

 

> In this case we first started reviewing if there was anything locked in kernel space:

 

1: kd> !locks

**** DUMP OF ALL RESOURCE OBJECTS ****

KD: Scanning for held locks............................................................

 

Resource @ 0x8a7f3008 Shared 1 owning threads

    Contention Count = 856651

    NumberOfExclusiveWaiters = 5

     Threads: 84149020-01<*>

     Threads Waiting On Exclusive Access:

              84166a80 8961d620 84166750 89649240

              841777a0

 

 

> Notice that we have 5 exclusive waiters wanting on a thread. Let’s see which thread is that:

 

1: kd> !thread 84149020

THREAD 84149020 Cid 1074.18a8 Teb: 7ff12000 Win32Thread: 00000000 READY

IRP List:

    fe5eae28: (0006,01d8) Flags: 00000884 Mdl: 00000000

    88bb1a68: (0006,01d8) Flags: 00000000 Mdl: 89128478

    fee50ba8: (0006,01d8) Flags: 00000000 Mdl: 880eb6c8

    f9d62af8: (0006,01d8) Flags: 00000000 Mdl: 887adee8

    fa284008: (0006,01d8) Flags: 00000000 Mdl: 87fb7bb8

    feb2dd80: (0006,01d8) Flags: 00000000 Mdl: 88bbdc48

    ff8c95f0: (0006,01d8) Flags: 00000000 Mdl: 88bbe488

    fac4b820: (0006,01d8) Flags: 00000000 Mdl: 8847d798

    fea1d008: (0006,01d8) Flags: 00000000 Mdl: 89252968

    ff097008: (0006,01d8) Flags: 00000000 Mdl: 8a5ee320

    fe64b008: (0006,01d8) Flags: 00000000 Mdl: 88a114a0

    feba1b30: (0006,01d8) Flags: 00000000 Mdl: 8975beb8

    ff499550: (0006,01d8) Flags: 00000000 Mdl: 87cd53a8

    ff5cf9f0: (0006,01d8) Flags: 00000000 Mdl: 8856dd18

    fc26ec38: (0006,01d8) Flags: 00000000 Mdl: 883df0a8

Not impersonating

DeviceMap e1750820

Owning Process 89f4ead8 Image: wspsrv.exe

Wait Start TickCount 4518588 Ticks: 0

Context Switch Count 3718083

UserTime 00:01:15.546

KernelTime 00:01:02.906

Win32 Start Address 0x0044bf26

Start Address 0x77e617ec

Stack Init b7b39000 Current b7b38898 Base b7b39000 Limit b7b36000 Call 0

Priority 8 BasePriority 8 PriorityDecrement 0

ChildEBP RetAddr Args to Child

b7b388b0 8083d5b1 84149020 841490c8 00000000 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])

b7b388dc 8083df9e 84149020 8a7f3008 00000000 nt!KiSwapThread+0x2e5 (FPO: [Non-Fpo])

b7b38924 8082e057 88f99718 0000001b 00000000 nt!KeWaitForSingleObject+0x346 (FPO: [Non-Fpo])

b7b3895c 80824ba8 89a0a400 00000000 8082c38e nt!ExpWaitForResource+0x30 (FPO: [Non-Fpo])

b7b3897c 8085110e 8a7f3008 00000001 b7b389d4 nt!ExAcquireResourceSharedLite+0xf5 (FPO: [Non-Fpo])

b7b3898c b9fb5c70 8a7f3008 89a0a400 00000000 nt!ExEnterCriticalRegionAndAcquireResourceShared+0x19 (FPO: [Non-Fpo])

b7b389d4 b9fb5b17 b7b38a18 b7b389ec 00000028 afd!AfdGetTransportInfo+0x1b (FPO: [Non-Fpo])

b7b389f0 b9fb5d9f b7b38a34 b7b38a18 00000000 afd!AfdAllocateEndpoint+0x1e (FPO: [Non-Fpo])

b7b38a38 b9fbf7a4 89b1e438 8a9a9f38 b7b38a5c afd!AfdCreate+0x13b (FPO: [Non-Fpo])

b7b38a48 80840153 8a9bc030 fe5eae28 fe5eae28 afd!AfdDispatch+0x79 (FPO: [Non-Fpo])

b7b38a5c 8092e87e b7b38c04 8a9bc018 00000000 nt!IofCallDriver+0x45 (FPO: [Non-Fpo])

b7b38b44 8092c3ea 8a9bc030 00000000 89062228 nt!IopParseDevice+0xa35 (FPO: [Non-Fpo])

b7b38bc4 8092d80d 00000000 b7b38c04 00000042 nt!ObpLookupObjectName+0x5b0 (FPO: [Non-Fpo])

b7b38c18 8092c6cd 00000000 00000000 dffa7c01 nt!ObOpenObjectByName+0xea (FPO: [Non-Fpo])

b7b38c94 80931d92 29b2f958 c0140000 29b2f90c nt!IopCreateFile+0x447 (FPO: [Non-Fpo])

b7b38cf0 8092fb28 29b2f958 c0140000 29b2f90c nt!IoCreateFile+0xa3 (FPO: [Non-Fpo])

b7b38d30 80833bef 29b2f958 c0140000 29b2f90c nt!NtCreateFile+0x30 (FPO: [Non-Fpo])

b7b38d30 7c82860c 29b2f958 c0140000 29b2f90c nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ b7b38d64)

WARNING: Frame IP not in any known module. Following frames may be wrong.

29b2f9ec 00000000 00000000 00000000 00000000 0x7c82860c

 

> This stack has a very good amount of IRPs (I/O Request Packet), let’s see if there is any pending IRP:

 

1: kd> !irp 88bb1a68

Irp is active with 4 stacks 2 is current (= 0x88bb1afc)

 Mdl=89128478: No System Buffer: Thread 84149020: Irp stack trace.

   cmd flg cl Device File Completion-Context

 [ 0, 0] 0 0 00000000 00000000 00000000-00000000

 

                     Args: 00000000 00000000 00000000 00000000

>[ f, 8] 0 e1 8a94ddf0 870404b0 b9e964d0-870f66c8 Success Error Cancel pending

       \Driver\Tcpip*** ERROR: Module load completed but symbols could not be loaded for idmtdi.sys

       idmtdi

                     Args: 0000127b 00010020 00000000 00000000

 [ f, 8] 0 e0 8a85d020 870404b0 b9fc1b72-833955a0 Success Error Cancel

       \Driver\IDMTDI afd!AfdRestartBufferReceiveWithUserIrp

                     Args: 0000127b 00010020 00000000 00000000

 [ e, 5] 5 0 8a9bc030 8583a190 00000000-00000000

       \Driver\AFD

                     Args: 0000127b 00000000 20000020 00000000

 

 

> Notice that there is a third party transport driver interface (idmtdi.sys) loaded within this IRP and it is pending.

 

4. Conclusion

As you could see not always the user mode dump will give us the definitive answer for a problem, mainly when most of the threads on user space are waiting for something to happen on kernel space. In this particular case, when reading the kernel dump, we used the !locks command, which displays all locks held on resources by threads. A lock can be shared or exclusive, which means no other threads can gain access to that resource. In this case we saw that we had five threads waiting for exclusive access. The threads that were waiting for exclusive access were part of the wspsrv.exe.

What we can conclude here is that we have a condition where wspsrv.exe was allocating more and more threads without releasing resources (releasing threads) because it was getting stuck waiting for an answer from kernel driver. Some threads were working on I/O, mainly socket and LRPC call (almost socket) that run in Kernel mode. The third party driver is causing every AFD call to take more time and more CPU cycles, the fact that ISA is extremely using sockets amplify the problem. The third party kernel driver was uninstalled from ISA Server and the issue was resolved.

Another takeaway from this particular scenario is that not always that wsprsv.exe is using lot of CPU resources is because it’s an ISA issue. Keep that in mind.

 

Author

Gabriel Koren

Sr. Support Escalation Engineer

Microsoft CSS Forefront Edge Team

 

Technical Reviewer

Franck Heilmann

Sr. Escalation Engineer

Microsoft CSS Forefront Edge Team