Troubleshooting Server Hangs – Part Two

Several months ago, we wrote a post on Troubleshooting Server Hangs.  At the end of that post, we provided some basic steps to follow with respect to server hangs.  The last step in the list was following the steps in KB Article 244139 to prepare the system to capture a complete memory dump for analysis.  Now that you have the memory dump, what exactly are you supposed to do with it?  That will be the topic of today’s post – more specifically, dealing with server hangs due to resource depletion.  We discussed various aspects of resource depletion including Paged and NonPaged pool depletion and System PTE’s.  Today we’re going to look at Pool Resource depletion, and how to use the Debugging Tools to troubleshoot the issue.

If the server is experiencing Non paged pool (NPP) memory leak or a Paged pool (PP) memory leak you are most likely to see the following event id’s respectively in the System Event log:

Type: Error 
Date: <date> 
Time: <time> 
Event ID: 2019
Source: Srv 
User: N/A 
Computer: <ComputerName> 
Details: The server was unable to allocate from the system nonpaged pool because the pool was empty. 

Type: Error 
Date: <date> 
Time: <time> 
Event ID: 2020 
Source: Srv 
User: N/A 
Computer: <ComputerName> 
Details: The server was unable to allocate from the system Paged pool because the pool was empty

Let’s load up our memory dump file in the Windows Debugging tool (WINDBG.EXE).  If you have never set up the Debugging Tools and configured the symbols, you can find instructions on the Debugging Tools for Windows Overview page.  Once we have our dump file loaded type !vm in the prompt to display the Virtual Memory Usage for the system.  The output will be similar to what is below:

kd> !vm *** Virtual Memory Usage *** Physical Memory: 917085 ( 3668340 Kb) Page File: \??\C:\pagefile.sys Current: 4193280 Kb Free Space: 4174504 Kb Minimum: 4193280 Kb Maximum: 4193280 Kb Page File: \??\D:\pagefile.sys Current: 4193280 Kb Free Space: 4168192 Kb Minimum: 4193280 Kb Maximum: 4193280 Kb Available Pages: 777529 ( 3110116 Kb) ResAvail Pages: 864727 ( 3458908 Kb) Locked IO Pages: 237 ( 948 Kb) Free System PTEs: 17450 ( 69800 Kb) Free NP PTEs: 952 ( 3808 Kb) Free Special NP: 0 ( 0 Kb) Modified Pages: 90 ( 360 Kb) Modified PF Pages: 81 ( 324 Kb) NonPagedPool Usage: 30294 ( 121176 Kb) NonPagedPool Max: 32640 ( 130560 Kb)

********** Excessive NonPaged Pool Usage *****

PagedPool 0 Usage: 4960 ( 19840 Kb) PagedPool 1 Usage: 642 ( 2568 Kb) PagedPool 2 Usage: 646 ( 2584 Kb) PagedPool 3 Usage: 648 ( 2592 Kb) PagedPool 4 Usage: 653 ( 2612 Kb) PagedPool Usage: 7549 ( 30196 Kb) PagedPool Maximum: 62464 ( 249856 Kb) Shared Commit: 3140 ( 12560 Kb) Special Pool: 0 ( 0 Kb) Shared Process: 5468 ( 21872 Kb) PagedPool Commit: 7551 ( 30204 Kb) Driver Commit: 1766 ( 7064 Kb) Committed pages: 124039 ( 496156 Kb) Commit limit: 2978421 ( 11913684 Kb)

As you can see, this command provides details about the usage of Paged and NonPaged Pool Memory, Free System PTE’s and Available Physical Memory.  As we can see from the output above, this system is suffering from excessive NonPaged Pool usage.  There is a maximum of 128MB of NonPaged Pool available and 121MB of this NonPaged Pool is in use:

NonPagedPool Usage:    30294 (    121176 Kb)
NonPagedPool Max:      32640 (    130560 Kb)

Our next step is to determine what is consuming the NonPaged Pool.  Within the debugger, there is a very useful command called !poolused.  We use this command to find the Pool Tag that is consuming our NonPaged Pool.  The !poolused 2 command will list out NonPaged Pool consumption, and !poolused 4 lists the Paged Pool consumption.  A quick note here; the output from the !poolused commands could be very lengthy as they will list all of the tags in use.  To limit the display to the Top 10 consumers, we can use the /t10 switch:  !poolused /t10 2.

0: kd> !poolused 2
   Sorting by  NonPaged Pool Consumed
  Pool Used:
            NonPaged            Paged
 Tag    Allocs     Used    Allocs     Used
 R100        3  9437184        15   695744    UNKNOWN pooltag 'R100', please update pooltag.txt
 MmCm       34  3068448         0        0    Calls made to MmAllocateContiguousMemory , Binary: nt!mm
 LSwi        1  2584576         0        0    initial work context 
 TCPt       28  1456464         0        0    TCP/IP network protocol , Binary: TCP
 File     7990  1222608         0        0    File objects 
 Pool        3  1134592         0        0    Pool tables, etc. 
 Thre     1460   911040         0        0    Thread objects , Binary: nt!ps
 Devi      337   656352         0        0    Device objects 
 Even    12505   606096         0        0    Event objects 
 naFF      300   511720         0        0    UNKNOWN pooltag 'naFF', please update pooltag.txt

Once the tag is identified we can use the steps that we outlined in our previous post, An Introduction to Pool Tags to identify which driver is using that tag.  If the driver is out of date, then we can update it.  However, there may be some instances where we have the latest version of the driver, and we will need to engage the software vendor directly for additional assistance.

That brings us to the end on this post – in Part Three, we will discuss using Task Manager and the Debugging Tools to troubleshoot Handle Leaks which may be causing Server Hangs.

– Sakthi Ganesh

Share this post :