Blue Screen 0x000000F4 "CRITICAL_OBJECT_TERMINATED" ... and you thought F4 was just another key on your keyboard!

Some times when you are working hard on a issue and when success finally arrives, its hard to keep it to yourself. You stand up and rejoice however the real joy is when you share it! So here is another post from my able friend.

[Today's post comes to us courtesy of Nandan Sheth]

Hi guys,

A week or so ago, I got a case where the server would not boot in normal mode. When we tried to do so, the server would come to applying computer settings. And then bang! It would bring us face to face with the feared Blue Screen of Death, BSOD, or whatever else you may want to call it. The truth is, the server just didn’t want to start up in the normal mode.

The technical information displayed on the screen was unknown to me when I started working on the case:

Stop Error: 0x000000F4 {0x00000003, 0x8a0ce34e, 0x8a72e331, 0x80073e32}

CRITICAL_OBJECT_TERMINATED

Digging a little into the blue screen, here is some information I found:

Parameter 1: 0x00000003

This parameter tells us what type of an object is being terminated. If the value is 0x3, the terminating object is a process. If the value is 0x6, the terminating object is a thread.

Parameter 2: 0x8a0ce34e

This is a hexadecimal identifier to the terminating object.

Parameter 3: 0x8a72e331

This is a hexadecimal identifier to the image file for the terminating object.

Parameter 4: 0x80073e32

This is a pointer to an ASCII string that contains an explanatory message.

Typically, this stop error is generated with a first parameter 0x3 when csrss.exe crashes. This is a core OS process, and has to be running at all times.

So, what was the last change made on the server? Automatic Updates had installed the patches available on the articles KB925876 and KB939653. Before the support incident was opened up, these updates had been uninstalled though.

As far as going into the safe mode was concerned, I got an answer before I could ask a question. We were already booted in the safe mode when we started working on the issue.

So, we can’t go into the normal mode, but we could go into the safe mode. My first thought here was 3rd party application installed on the server. A quick inventory showed me we had quite a bit of 3rd party services running on the system:

Symantec Corporate Edition

Symantec Mail Security for Exchange v6

APC Power Chute

VNC

All of these are known to have caused a problem sometime or the other. So, we used MSCONFIG to disable the 3rd party applications and services along with all the startup items on the server. Fairly confident, we’d nailed the issue, we rebooted. But we were nowhere farther than where we started from, for the server went to a blue screen a little after it finished preparing the network settings. Back in the safe mode, we researched a bit. We started considering the fact that we might have a problem with the hardware. So we inventoried the hardware on the server:

Dell Poweredge 1600 SC

Circ ATA 100 C/H S SCSI Controller

3 hard drives on a hardware RAID

4 GB RAM

At this point, we were just lost. With 4 GB, there was no way we could get a full memory dump. So, we went about tweaking the system.

1. Limit the memory usage on the server to 2 GB or less. To do this, we can use the /maxmem switch, and specify the required value. We used /maxmem=2000. The value specified is always in MB.

2. Next, check to see if the paging file is only on the System drive. This can be done from the properties of My Computer. If you have a distributed paging file, delete the paging file on your data drive, and set it all up on the system drive.

3. Also make sure that the page file size is at least 1 MB more than the actual RAM on the server.

4. Reboot the server (and as in this case, go back into the safe mode).

5. Go back into the properties of My Computer, and set the computer for a complete memory dump.

NOTE: KB130536 is a good article to go through if you are configuring a system for a memory dump. It provides a good guideline on how to configure the system to gather a memory dump.

Geared up to collect a memory dump, we rebooted the server. But the system didn’t bug out on us. It actually proceeded to the desktop. Ecstatic, we decided to reboot the server one more time. This time too, it came up to the desktop. So, as a next step, we removed the /maxmem=2000 switch from boot.ini. We rebooted the server with the full 4 GB RAM. And, it came back to the desktop. It seemed unreal. But we were there, so we decided to test the services on the server. The first thing we did was to access network shares on the share. As soon as we did this, we knew we were in for a long night, since the server crashed as soon as we tried to go into a network share from a client computer. Back to square one, we went back into the safe mode, added the /maxmem switch and made sure we had all the settings for the memory dump. And then we tried to reboot into the normal mode. We got the blue screen alright, but there was no dump. The system just sat there, waiting for us to do something to it. We tried one more time to go into the normal mode, and to get ourselves a memory dump but it didn’t work. Had we booted up successfully with the system limited to using just 2 GB RAM, we would have proceeded with a memory check.

NOTE: You can download the MemCheck utility from https://oca.microsoft.com/en/windiag.asp. This is a fairly simple tool, and the download page provides you with a usage guide as well.

The next step was to go back into the safe mode. From here, we ran MSINFO32 on the server. This would provide us a list of system start drivers. These are the drivers that are being initialized during the “Applying Computer Settings” part of the boot process.

clip_image002

On the list of system start drivers, we sorted the drivers by the Start_Mode column, so we could group the system start drivers together. The next step was to start the server in Safe Mode with Networking. Here, we ran MSINFO32 one more time, and checked the system start drivers. 

clip_image004

Using the two MSINFO32 options, we basically narrowed down to the system start drivers that we could do without in the normal mode of operation. After comparing the two screens, we disabled the following drivers through the registry:

HKLM\SYSTEM\CurrentControlSet\Services

ASPI32

CDAUDIO

Changer

FIPS

I2omgmt

Lbrtfdc

Mnmdd

PCIDump

Serial

Sfloppy

TGA

For all of these drivers, we change the start value to 4. We then tried to reboot the server, but it crashed again.

Also, by this time, /maxmem or no /maxmem, we were getting the blue screen; and we weren’t getting the dump! Next, we went back into the safe mode. We went back to MSINFO32. The information we now looked at was under loaded modules. What were we looking for? Non-Microsoft DLL’s that had been loaded into memory. We found only one:

clip_image006

A quick search showed us that navlogon.dll is a Symantec library. We went into Windows Explorer, searched for the file, and renamed it to navlogon.old. Since we were renaming a library owned by Symantec, we also changed the start type for the following Symantec drivers:

Symevent

NAVAP

NAVENG

NAVEX15

Symevent is a Symantec AV filter driver. KB816071 is a good article on the Microsoft Knowledge Base that talks about filter drivers for different software. Once changing the start type of these drivers to 4 is what you need to do to disable these drivers. We then rebooted the server in the normal mode, to no luck. This brought us to a point where it was certain we didn’t have a software issue on the system. There was something else, something pertaining to hardware that was causing the problem.

Our next step to troubleshooting the issue was a hardware clean boot. This is a process where you disable all hardware on the system leaving only the most essential hardware untouched. So we went into the device manager, and disabled everything except the System Board, the RAID controller, the Hard Drives, the Key Board and the Mouse. For video, we used the basic VGA display. And we were still unable to boot normally. Once again, we got to “Applying Computer Settings” and then crashed.

clip_image008 clip_image010

Please note, that screen shots above are taken from a working system. Accordingly, all hardware doesn’t show up as disabled since the change would only come into effect upon reboot. There is just one hardware component here that has been disabled but still shows up as enabled. This is the ConfigMgr Remote Control Driver under Display Adapters.  The devices under the following categories have been left untouched:

Computer

Disk Drives

Human Interface Devices

IDE ATA/ATAPI Controllers

Keyboards

Mice and Other Pointing Devices

Processors

During the course of troubleshooting, we disabled the Network Cards on the server in question. Here, I’m actually RDP’d into a test server, so, I left the NIC untouched. The Sound, Video and Game controllers list some codecs that cannot be disabled. Also, under the System Devices, the devices that don’t have an X on them are core system devices, and they don’t have an option to be disabled.

So what brought us to the desktop? After a lot of nothing, we thought of looking at the event logs. I didn’t want someone telling me events being logged on the server over the phone. So we reconfigured a  client to connect directly to the router, and to go on the internet. From the Microsoft download website, we downloaded MPS Reports for Setup and Performance. We then copied the file onto a USB drive, and imported it on the server. Once we ran the MPS Reports, we copied the resultant CAB file onto the USB drive, and brought it in house for analysis. We didn’t have to look very long. A quick look at the system logs gave us good information to go with.

NOTE: MPS Reports are available for download on https://www.microsoft.com/downloads/details.aspx?FamilyID=cebf3c7c-7ca5-408f-88b7-f9c79b7306c0&DisplayLang=en. There are different executables you could download, based on the nature of the problem you are working on. Using the executable for the right specialty is important, so the correct set of tests required to gather diagnostic information would be run on the server.

It turned out that the system was crashing as soon as the Network Cards were initialized. Running MSINFO32 earlier had already told us that SP2 for Windows 2003 had been installed. This got us thinking. I’d heard of networking problems on SBS 2003 after installing SP2 for Windows 2003. These problems are very well documented on KB936594. But could the same changes help us here? After trying so many things, we decided a couple of registry changes couldn’t do much harm, even if they didn’t help us. So, we made changes on the registry. Changes that we’ve been recommending, even making, on so many servers everyday. From the registry, we edited the following registry values:

HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\EnableRSS – This was set to 1. We changed it to 0.

HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\EnableTCPChimney – This was set to 1. We changed it to 0.

HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\DisableTaskOffload – We had to create this REG_DWORD value and set it to 1.

Little did we know that these changes would call for the last reboot, for as soon as we implemented these changes, the server came up just fine with the /maxmem switch. We then slowly proceeded to undo the changes we had made all this while. It was slow going, but we got it done pretty soon. All hardware, all software returned to normal functionality. We were still booting up good. All services were working. Email was working. There was nothing the end user could complain about. And I could get back to the other cases I had forgotten about.

Now, looking back, there are some things worth thinking about.

1. If you have SP2 installed on the server, make sure you follow that up with the patch on KB936594, or implement the changes manually.

2. Always try and stick to the basics. We always had full access to the event logs, and the first time I thought about looking at them was seven hours into troubleshooting.

3. The more complex an issue seems to be, the simpler the resolution turns out to be.

All said and done, this was one great learning experience.