While I long for the day when I no longer experience the effects of buggy software, there’s something rewarding about solving my own troubleshooting cases. In the process, I often come up with new techniques to add to my bag of tricks and to share with you in my “Case of the Unexplained…” presentations and blog posts. The other day I successfully closed an especially interesting case that opened when Internet Explorer (IE) crashed as I was reading a web page:
Whenever I experience a crash, whether it’s the system or an application, I always take a look at it. There’s no guarantee, but many times after spending just a few minutes I find clues that point at an add-on as the cause and ultimately a fix or workaround. In most cases when it’s an application crash, the faulty process is obvious and I simply launch Windbg (from the free Debugging Tools for Windows package that comes with the Windows SDK and Windows DDK), attach it to the process, and start investigating.
Sometimes however, the faulting process isn’t obvious, like was the case when I saw the IE crash dialog. That’s because I was running IE8, which has a multi-process model where different tabs are hosted in different processes:
I had multiple tabs open as usual, so I had to figure out which IE process of the four that were running (in addition to the parent broker instance) was the one that had crashed. I could have taken the brute-force approach of attaching to each process in turn and searching for the faulting thread, but there’s fortunately a simpler and more direct way to identify the target process.
When a process crashes, the Windows Error Reporting (WER) service launches its own process, called WerFault, in the session of the crashed process to display the error dialog to the user running the session and to generate a crash dump file. So that WerFault knows which process is the one that crashed, the WER service passes the process ID (PID) of the target on WerFault’s command line. You can easily view the command line with Process Explorer. Because I always have Process Explorer running with its icon visible in the tray area of the taskbar, I clicked on the icon to open it and found the WER process in the process tree:
I double-clicked on it to open the process properties dialog and the command line revealed the process ID of the problematic IE process:
Now that I knew it was process 4440 in which I was interested, I started Windbg, pressed F6 to open the process selection dialog, and double-clicked on Iexplore.exe process 4440. With Windbg attached, my next step was to locate the thread that had faulted so that I could examine its stack for signs of a buggy add-on. In some cases, relying on Windbg’s built-in crash analysis heuristics, which you can trigger with the !analyze command, will do the job for you, but it didn’t this time. Finding the faulting thread is fairly straightforward, though.
First, go to Windbg’s View menu and open both the Processes and Threads and the Call Stack dialogs, arranging them side by side. The goal is to find the thread that has functions with the words fault, exception, or unhandled in their names. You can quickly do this by selecting each thread in the Processes and Threads window, pressing Enter, and then scanning the stack that appears in the Call Stack window. After doing this for the first few threads, I came across the thread I was looking for, revealed by functions all over its stack containing the telltale strings:
Unfortunately, I was at an apparent dead end as far as fingering an add-on: all the DLLs shown in the call stack were Microsoft’s. There was one indicator that there might be an add-on hidden from view though, and that was the text reporting that Windbg couldn’t find symbols for at least some of the stack’s frames, so was forced to make guesses about the stack’s layout and was showing an address that didn’t lie within any DLL:
This happens when a DLL uses frame pointer omitted (FPO) calling conventions, which in the absence of symbolic information for the DLL prevents the debugger from finding stack frames just by following the frame-pointer chain. The return addresses for the functions the thread invoked must be on the stack (unless they were overwritten by the bug that caused the crash), but Windbg’s heuristics couldn’t locate them.
There’s a Windbg command that you can use in these cases to hunt for the missing frame function addresses, the Display Words and Symbols command. If you’re debugging a 32-bit process, use the dds version of the command and if it’s a 64-bit process use dqs. You can also use dps (Display Pointer Symbols), which will interpret the function addresses as the appropriate size for a 32-bit or 64-bit process. The address to give to the command as the starting point should be the address of the stack frame immediately above the one where Windbg got lost. To see the address, click on the Addrs button in the call stack dialog:
The address on the frame in question was 2cbc5c8:
I passed it to dds as the argument and pressed enter:
The first page of results didn’t list any functions besides the expected one, KiUserException. I hit the enter key again without typing another command, because for address-based commands like dds, that tells Windbg to repeat the last the last command at the address where it left off. The second page of results yielded something more interesting, the name of a DLL I wasn’t familiar with:
An easy way to see version information for a module without leaving Windbg is to use the lm (List Modules) command. The output of that command told me that Yt.dll (the name of the DLL is the text to the left of the “!”) was part of the Yahoo Toolbar:
This came as a surprise because the system on which the crash occurred was my home gaming system, a computer that I’d only had for a few weeks. The only software I generally install on my gaming systems are Microsoft Office and games. I don’t use browser toolbars and if I did, would obviously use the one from Bing, not Yahoo’s. Further, the date on the DLL showed that it was almost two years old. I’m pretty diligent about looking for opt-out checkboxes on software installers, so the likely explanation was that the toolbar had come onto my system piggybacking on the installation of one of the several video-card stress testing and temperature profiling tools I used while overclocking the system. I find the practice of forcing users to opt-out annoying and not giving them a choice even more so, so was pretty annoyed at this point. A quick trip to the Control Panel and a few minutes later and my system was free from the undesired and out-of-date toolbar.
Using a couple of handy troubleshooting techniques, within less than five minutes I had identified the probable cause of the crash I experienced, made my system more reliable, and probably even improved its performance. Case closed.