My machine keeps bluescreening, how can I look at the dump?

Backstory

Ahh the good old blue screen of death (BSOD), we've all seen it and always blame Microsoft for the issue.  When I hang out with my friends on the outside they're always telling me how their Macs never crash and why does Microsoft make such poor software.  Well first off Macs don't need to support a wide array of hardware and vendors, so they only have to make drivers for a small set of devices.  By doing so they minimize the amount of hardware problems they run into.  Surprisingly a lot of the blue screens that people see are not due to the operating system, but in fact due to poorly written drivers.  The good news is that in the Vista timeframe drivers must be signed to load in the operating system which should cut down on the number of bluescreens one sees.

Well now that I got that off my chest lets talk about these BSODs and how to look at the dump files which might give us some information about the crash.  Of course, you would not really need to do this unless you are a nerd and want to poke around, which I am sure there are people that want to.

System dump settings

Lets start with how the dumps are created, if you open up system in control panel and go to the advanced tab you'll see a startup and recovery section with a settings button, if you click on that you can then adjust what happens when your machine hits an unexpected error.  By default a small minidump is created and your system is automatically rebooted.  Personally I change this to a kernel dump which will get us all the kernel state information, usually enough to find out what's going on.  You can also set the system to not automatically reboot if you hit a BSOD.  So if your machine is bluesscreening often then perhaps a minidump won't suffice and you can change it to a kernel dump.  In our example though we'll just analyze a minidump on my system.

Tools

Well you're gonna need some tools to poke around in eh dump file, so download the appropriate package and install these, I like to install them to c:\debuggers but it really doesn't matter.

 x86  (pops)

 x64  (pops)

The debugger package comes with a bunch of tools which are fun to play with so explore and get lost.  There also is a CHM file called debuggers.chm which will give you a lot of information about the different tools and their uses.

Symbols

Without these little buggers your dump file will be a bit hard to analyze.  Think of symbols as your secret decoder ring, without them you're just going to see a bunch of junk, but with the symbols you can understand the message.  Symbols tell us the variables, functions, methods, etc and display those on the screen so that us humans can understand what were looking at.  Thankfully there is a Microsoft public symbols server which you can point at.  If you go here you can watch a video about using symbols, which you'll REALLY need to understand.

Time to play

Alright lets open this dump file and try to figure out what is going on.  Again, I am only using a minidump so I wont really be able to get that much info but I'm not going to go that deep into it anyways.  Comments are in red and are led by //...

C:\Debuggers>cdb -z c:\WINDOWS\Minidump\Mini030506-01.dmp //So I am using a tool called CDB which is a user mode debugger, but its smart enough to figure out that this is a kernel dump, the -z means that we want to look at a dump file.

Microsoft (R) Windows Debugger Version 6.6.0007.5
Copyright (c) Microsoft Corporation. All rights reserved.

Loading Dump File [c:\WINDOWS\Minidump\Mini030506-01.dmp]
Mini Kernel Dump File: Only registers and stack trace are available

Symbol search path is: *** Invalid *** //Since we did not use the -y option to set the symbols when we opened the dump the debugger is going to yell at us.
****************************************************************************
* Symbol loading may be unreliable without a symbol search path. *
* Use .symfix to have the debugger choose a symbol path. *
* After setting your symbol path, use .reload to refresh symbol locations. *
****************************************************************************
Executable search path is:
*********************************************************************
* Symbols can not be loaded because symbol path is not initialized. * //Here the debugger is trying to spell it out for us, we can set the symbols a number of ways, since we're already in the debugger we'll use the .sympath option... From here on down to the part where it shows "bugcheck analysis" is all info about not having your symbols correct.
* *
* The Symbol Path can be set by: *
* using the _NT_SYMBOL_PATH environment variable. *
* using the -y <symbol_path> argument when starting the debugger. *
* using .sympath and .sympath+ *
*********************************************************************
Unable to load image ntoskrnl.exe, Win32 error 2
*** WARNING: Unable to verify timestamp for ntoskrnl.exe
*** ERROR: Module load completed but symbols could not be loaded for ntoskrnl.exe
Windows Server 2003 Kernel Version 3790 (Service Pack 1) UP Free x64
Product: WinNt, suite: TerminalServer SingleUserTS
Kernel base = 0xfffff800`01000000 PsLoadedModuleList = 0xfffff800`01197ac0
Debug session time: Sun Mar 5 14:42:08.062 2006 (GMT-7)
System Uptime: 0 days 0:00:27.640
*********************************************************************
* Symbols can not be loaded because symbol path is not initialized. *
* *
* The Symbol Path can be set by: *
* using the _NT_SYMBOL_PATH environment variable. *
* using the -y <symbol_path> argument when starting the debugger. *
* using .sympath and .sympath+ *
*********************************************************************
Unable to load image ntoskrnl.exe, Win32 error 2
*** WARNING: Unable to verify timestamp for ntoskrnl.exe
*** ERROR: Module load completed but symbols could not be loaded for ntoskrnl.exe
Loading Kernel Symbols
.................................................................................................
Loading User Symbols
Loading unloaded module list
...
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck C0000218, {fffffa8000a4d4a0, 0, 0, 0} //So here was our bugcheck code, we'll look at it further when we got our symbols straightened out.

***** Kernel symbols are WRONG. Please fix symbols to do analysis.

***** Kernel symbols are WRONG. Please fix symbols to do analysis.

*************************************************************************
*** ***
*** ***
*** Your debugger is not using the correct symbols ***
*** ***
*** In order for this command to work properly, your symbol path ***
*** must point to .pdb files that have full type information. ***
*** ***
*** Certain .pdb files (such as the public OS symbols) do not ***
*** contain the required information. Contact the group that ***
*** provided you with these symbols if you need this command to ***
*** work. ***
*** ***
*** Type referenced: nt!_KPRCB ***
*** ***
*************************************************************************
Unable to load image Ntfs.sys, Win32 error 2
*** WARNING: Unable to verify timestamp for Ntfs.sys
*** ERROR: Module load completed but symbols could not be loaded for Ntfs.sys
Probably caused by : ntoskrnl.exe ( nt+41910 )

Followup: MachineOwner
---------

//So at this point you'll be sitting at a KD prompt, now it time to fix up the symbols and analyze the dump.

kd> .sympath https://msdl.microsoft.com/download/symbols //Here I am setting the symbol path to point to Microsoft public symbol server.
Symbol search path is: https://msdl.microsoft.com/download/symbols
kd> .reload //Now I am reloading the symbols, since I set the path to valid symbols. If for some reason it still is not coming back with valid information then you could use this command which would give you more information abotu laoding symbols: !sym nosiy
*** WARNING: Unable to verify checksum for ntoskrnl.exe
Loading Kernel Symbols
.................................................................................................
Loading User Symbols
Loading unloaded module list
...
kd> !analyze -v //So now that we have our symbols loaded, lets use the built in command !analyze to view what happened.
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

Unknown bugcheck code (c0000218) //Our bugcheck code again
Unknown bugcheck description
Arguments:
Arg1: fffffa8000a4d4a0
Arg2: 0000000000000000
Arg3: 0000000000000000
Arg4: 0000000000000000

Debugging Details:
------------------

CUSTOMER_CRASH_COUNT: 1

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0xC0000218

PROCESS_NAME: System

CURRENT_IRQL: 0

LAST_CONTROL_TRANSFER: from fffff80001101e00 to fffff80001041910

STACK_TEXT: //The call stack is the chain of function calls which have led to the current location of the program counter. The top function on the call stack is the current function, the next function is the function which called the current function, and so forth. Read the debuggers.chm for more info.
fffffadf`e842a498 fffff800`01101e00 : 00000000`0000004c 00000000`c0000218 fffffadf`e842a528 fffffadf`e9ef73a0 : nt!KeBugCheckEx
fffffadf`e842a4a0 fffff800`01081c47 : fffffadf`c0000218 00000004`00000000 00000000`00040fff 00000000`00040000 : nt!ExpSystemErrorHandler2+0xae6
fffffadf`e842a6f0 fffff800`013343fc : 00000000`c0000218 00000000`00000001 00000000`00000001 00000000`00040000 : nt!ExpSystemErrorHandler+0xd4
fffffadf`e842a730 fffff800`012663e6 : 00000000`c0000218 00000000`00000001 fffffadf`00000001 00000000`00040000 : nt!ExpRaiseHardError+0xf4
fffffadf`e842aa70 fffff800`012f2153 : 00000000`c0000218 00000000`00000001 00000000`00000001 fffffadf`e842abe8 : nt!ExRaiseHardError+0x1d1
fffffadf`e842ab60 fffff800`012b226e : 00000000`00000000 fffffadf`eaade870 00000000`00000080 fffffadf`eaade870 : nt!CmpLoadHiveThread+0x2e3
fffffadf`e842ad70 fffff800`01044416 : fffff800`01174180 fffffadf`eaade870 fffffadf`ea4d6870 fffffadf`ea4d6908 : nt!PspSystemThreadStartup+0x3e
fffffadf`e842add0 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxStartSystemThread+0x16

STACK_COMMAND: kb

FOLLOWUP_IP:
nt!ExpSystemErrorHandler2+ae6
fffff800`01101e00 cc int 3

SYMBOL_STACK_INDEX: 1

SYMBOL_NAME: nt!ExpSystemErrorHandler2+ae6

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: nt

IMAGE_NAME: ntoskrnl.exe

DEBUG_FLR_IMAGE_TIMESTAMP: 42436096

FAILURE_BUCKET_ID: X64_0xC0000218_nt!ExpSystemErrorHandler2+ae6

BUCKET_ID: X64_0xC0000218_nt!ExpSystemErrorHandler2+ae6

Followup: MachineOwner
---------

kd>

Now what

Well I can't sit here and show you every command known to man, but you can now take the information you have and start searching the internet for clues as to why your machine crashed.  Like I said there really isn't much more information from a minidump and if you are continually crashing you should try and grab a kernel dump.  My bugcheck was a c0000218, which by running !analyze -vv turns out to be:

 0xc0000218 - {Registry File Failure} The registry cannot load the hive (file): %hs or its log or alternate. It is corrupt, absent, or not writable.

Looking at the call stack above you can see nt!CmpLoadHiveThread+0x2e3 which looks about right, trying to load a registry hive.  I went to support.microsoft.com and plugged in c0000218 and it came back with this article: https://support.microsoft.com/kb/830084/en-us 

Stay tuned

This blog will consist of advanced troubleshooting techniques and other obscure FREE tools you can use to analyze information.

 

Technorati tags: bluescreen, dmp, debugger, debugging, BSOD

 

IceRocket tags: bluescreen, dmp, debugger, debugging, BSOD