Stop 0x000000D1 DRIVER_IRQL_NOT_LESS_OR_EQUAL (8a400000, 00000002, 00000000, f77e00a9)

I had a great call yesterday.  Great meaning I was able to empower a customer to debug a memory dump so he could potentially resolve a STOP error in the future more quickly.  Everyone wins in that scenario:  end customer is back up and running quickly, consultant can go off and make more service calls to more customers and Microsoft wins because he does not have to call for support (we lose money on each call).

 

Here's the issue:  Server is randomly rebooting about once every 4-5 days.  No bluescreen, no memory.dmp.  The behavior seemed like someone was simply pulling the plug on the server!  Not good!! 

 

Well, checking System Properties -> Advanced -> Settings for Startup and Recovery showed the server was set to "Small Memory Dump (64k)" under "Write debugging information" and to "Automatically restart".  A quick search on the server revealed a mini-dump of 64k dated the time of the last "crash".  What this tells me is that the server IS actually performing a bugcheck (Blue Screen of Death or BSOD) and rebooting but since it is only configured for a "Small Memory Dump", it dumps so quickly that the BSOD is never presented.  So, we set the "Write debugging information" to "Kernel Memory Dump" and rebooted.  Now we have to wait for the server to crash. 

 

After several days, it did and we now have a Memory.dmp file.  Here is the steps we performed to debug the memory dump:

  1. Downloaded and installed the current "Debugging Tools for Windows 32-bit version" from https://www.microsoft.com/whdc/devtools/debugging/installx86.mspx, choose "typical".
  2. Launch the debugger via Start -> All Programs -> Debugging Tools for Windows -> WinDbg
  3. Set the symbol file path:  File -> Symbol File Path.  From https://www.microsoft.com/whdc/devtools/debugging/debugstart.mspx: For example, to download symbols to c:\websymbols, you would add the following to your symbol path: SRV*c:\websymbols*https://msdl.microsoft.com/download/symbols.   I simply copied and pasted SRV*c:\websymbols*https://msdl.microsoft.com/download/symbols into the Symbol Search Path and then created a directory called "websymbols" on the root of the C drive.  You don't have to create the folder, the debugger *should* create it for you when it connects.
  4. I placed the check next to "Reload" and clicked OK.
  5. Load the dump file:  click File -> Open Crash Dump and browsed to the memory.dmp
  6. Clicked Yes to "Save Information for Workspace"
  7. Sit back and wait. 
  8. Take a quick look in c:\websymbols, you should see some stuff (symbols) appearing in this folder
  9. After some time (one minute to 5 minutes, ymmv), the debugger will be done loading and you will see "0:  kd>" in the small grey window at the bottom left of the screen.

Note the debugger does not *have* to be installed on the server itself.  All you have to do is have local access to the dump file.  You could copy the dump file to a Windows XP workstation and install the debugging tools on the workstation rather than the server.

 

Here's the output after loading the dump file (I did not run a single command).  Around 80% of the calls we get (PSS SBS) regarding memory dumps are resolved by simply loading the dump in the debugger, as illustrated below.

 

Microsoft (R) Windows Debugger Version 6.6.0003.5
Copyright (c) Microsoft Corporation. All rights reserved.

Loading Dump File [C:\Documents and Settings\petergal\My Documents\MEMORY.DMP]
Kernel Summary Dump File: Only kernel address space is available

Symbol search path is: SRV*c:\websymbols*https://msdl.microsoft.com/download/symbols
Executable search path is:
Windows Server 2003 Kernel Version 3790 MP (2 procs) Free x86 compatible
Product: LanManNt, suite: SmallBusiness TerminalServer SmallBusinessRestricted SingleUserTS
Built by: 3790.srv03_gdr.050225-1827
Kernel base = 0x804de000 PsLoadedModuleList = 0x8057b6a8
Debug session time: Wed Mar 22 02:59:01.750 2006 (GMT-6)
System Uptime: 1 days 9:42:01.500
Loading Kernel Symbols
.............................................................................................................
Loading User Symbols
PEB is paged out (Peb.Ldr = 7ffdf00c). Type ".hh dbgerr001" for details
Loading unloaded module list
.....
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck D1, {8a400000, 2, 0, f77e00a9}

*** ERROR: Module load completed but symbols could not be loaded for CSTDI50.sys
Probably caused by : CSTDI50.sys ( CSTDI50+10a9 )

Followup: MachineOwner
---------

 

Notice the "Probably caused by: CSTDI50.sys"   Ok, what the heck is that file?  Find the file -> properties -> version -> who's file is this anyway?  The file belongs to Colasoft.  As soon as we determined who the file belongs to, it was determined that this software was installed last November and a quick scroll through Event Viewer showed the problem started around November.  A quick search on the internet for CSTDI50.sys confirmed that Colasoft has a "known issue".

 

The action is to uninstall the Colasoft software. 

 

With the steps above, you *should* be able to hopefully determine the cause of the crash!

 

To be really geeky, "!analyze -v" (without quotes) can be ran in the debugger to give additional (in this case, pretty useless as we already know the cause) information:

 

0: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 8a400000, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000000, value 0 = read operation, 1 = write operation
Arg4: f77e00a9, address which referenced memory

Debugging Details:
------------------

READ_ADDRESS: 8a400000

CURRENT_IRQL: 2

FAULTING_IP:
CSTDI50+10a9
f77e00a9 f3a5 rep movsd

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0xD1

LAST_CONTROL_TRANSFER: from 804e2f58 to 80543ac9

STACK_TEXT:
b87be8b0 804e2f58 0000000a 8a400000 00000002 nt!KeBugCheckEx+0x19
b87be8b0 f77e00a9 0000000a 8a400000 00000002 nt!KiTrap0E+0x224
WARNING: Stack unwind information not available. Following frames may be wrong.
b87be94c f77e02d5 888c0630 887e30c8 01ed6000 CSTDI50+0x10a9
b87be974 f77e0747 884db750 8000089c 888b8200 CSTDI50+0x12d5
b87be990 f77e0c58 00000001 00000000 88b8cf68 CSTDI50+0x1747
b87be9bc f77e0ea3 00000001 00000000 88b8cf68 CSTDI50+0x1c58
b87bea08 804f0154 00000000 88b8cf68 018b81c8 CSTDI50+0x1ea3
b87bea38 ad82394a 88843288 88b8cfd8 f77dffdb nt!IopfCompleteRequest+0xa0
b87bea44 f77dffdb 8982b200 88b8cf68 88b8cfd8 tcpip!TCPDispatchInternalDeviceControl+0x134
b87beaa0 f77e10f1 8982b200 88b8cf68 f77e3e40 CSTDI50+0xfdb
b87beafc ad6b6851 884db750 b87beb84 00000004 CSTDI50+0x20f1
b87beb74 ad6b80c7 89924e40 8000089c 8000089c afd!AfdCreateConnection+0x195
b87beba0 ad6b80f7 898bfab0 8831f0e4 8831f008 afd!AfdAddFreeConnection+0x37
b87bebb4 ad6c5997 00000000 00012083 ad6c57f0 afd!AfdReplenishListenBacklog+0x13
b87bec30 ad6c0043 8831f008 89a27a60 804f0473 afd!AfdSuperAccept+0x1cf
b87bec3c 804f0473 89927030 8831f008 883175f0 afd!AfdDispatchDeviceControl+0x4f
b87bec4c 80585208 8831f0e4 897c0e90 8831f008 nt!IofCallDriver+0x3f
b87bec60 805860e6 89927030 8831f018 897c0e90 nt!IopSynchronousServiceTail+0x6f
b87bed00 80586128 000006e4 00000000 00000000 nt!IopXxxControlFile+0x607
b87bed34 804dfd24 000006e4 00000000 00000000 nt!NtDeviceIoControlFile+0x28
b87bed34 7ffe0304 000006e4 00000000 00000000 nt!KiSystemService+0xd0
0545ff00 00000000 00000000 00000000 00000000 SharedUserData!SystemCallStub+0x4

STACK_COMMAND: .bugcheck ; kb

FOLLOWUP_IP:
CSTDI50+10a9
f77e00a9 f3a5 rep movsd

FAULTING_SOURCE_CODE: 

SYMBOL_STACK_INDEX: 2

FOLLOWUP_NAME: MachineOwner

SYMBOL_NAME: CSTDI50+10a9

MODULE_NAME: CSTDI50

IMAGE_NAME: CSTDI50.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 42538bbd

FAILURE_BUCKET_ID: 0xD1_CSTDI50+10a9

BUCKET_ID: 0xD1_CSTDI50+10a9

Followup: MachineOwner
---------