“Devil’s in the details…”

Hi, My name is Kim and I’m a support engineer in the Core Performance team. I recently joined this group about six months ago and noticed that there are a variety of ways to gather the same data. I’d thought I share some of the tricks I’ve learned on gathering memory dumps.

 

You could say I'm bad at fractions. My last job I liked to explain as:

1. Half technical.

2. Half process. 

3. Half babysitting (no one mentions the third half. Fight Club Rules I guess) 

 

I recently tried to explain my new role as:

1. One-third "knowing how things Should work"

2. One-third "knowing what data to capture when its Not working"

3. One-third trying to figure out the difference.

4. (Fight club rules apply for the fourth-third)

 

Knowing how things should work depends on the issue at hand. Typically you can be as generic or granular as necessitated by the situation. For example: All I need to know about my car is that I turn the key and it starts. That’s how it should work. When it doesn’t start I would need to know more; Is it out of gas? Bad spark plug? Alternator? Battery? Bad key-chip? The more you know the more you can eliminate until you pin-point an area to dig into. 

 

Once you have an area to focus on we typically need to run capture data. Capturing data can be broken down into two areas; A snapshot of a single moment and a collection of snapshots over time. The best example of single snapshot is a memory dump. 

 

The concept is basic: whatever the computer is doing at any one moment in time, freeze it, and put all that info into a file. In practice there are several ways to take that picture.

 

The old dog - Control Scroll Scroll

The most common, hands on way to force a memory dump is to configure the server to dump on a specific keystroke combination. Specifically by hitting the right CTRL key and pressing the SCROLL LOCK key two times. 

 

PS2 keyboard

HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesi8042prtParameters

Name : CrashOnCtrlScroll
Data Type : REG_DWORD
Value : 1

 

  • A reboot is required before this becomes active. 

 

USB Keyboard

HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServiceskbdhidParameters

Name : CrashOnCtrlScroll
Data Type : REG_DWORD
Value : 1

  • A reboot is not required, but unplugging and plugging in the keyboard is needed.
  • Control Scroll Scroll can be used anytime but typically when the server is non-responsive, or "in-state".
  • In general this should work for any OS. Some operating systems may need hotfixes
  • The dump that is created is a Stop 0xE2. 

 

Remote Old Dog

When we don’t have the option to connect a keyboard directly was can configure the system to reboot remotely via Non Maskable Interface (NMI).

 

HKEY_LOCAL_MACHINESYSTEMCurrentControlSetControlCrashControl

Name : NMICrashDump
Data Type : REG_DWORD
Value : 1

 

  • A reboot is required before this becomes active
  • The NMI must be enabled in the BIOS or via Integrated Lights Out (iLO) Web interface.
  • Since there is no local connection to the server some typical behaviors are:
    • Not responsive to ping
    • Cannot connect to the server via terminal services
    • Cannot connect via UNC path
    • Cannot connect via remote registry (connecting to the server via event viewer or server management console etc) 
  • It's always good to try multiple methods of connecting to the server and note what works and what doesn’t.
  • In general this should work for any OS that has compatible hardware (NMI aware/iLO)
  • The dump that is created is a Stop 0x80

 

Just Do It!

There's a tool called NotMyFault that will crash the box on demand. 

Click Start

Locate and right-click Command Prompt

Select Run as administrator.

Type NotMyfault.exe /crash

 

  • No registry changes are needed to get the server to crash. 
  • This is used when the server is behaving badly
  • Out of resource errors
  • Very slow to respond
  • Generally, anything that is not normal server operation
  • The server isn't completely locked up (we need to run the executable to force the dump)
  • In general this should work for any OS.
  • The dump that is created is a 0xD1

 

Whatchu Talkin About?

NotMyFault.exe can be triggered automatically when an event is recorded in the event log. For example, if you're getting

Event ID: 2019
Source: Srv
Description: The server was unable to allocate from the system nonpaged pool because the pool was empty.

intermittently, when we get around to forcing the dump the issue may not be present. In that case we can setup a trigger to call NotMyFault as soon as 2019 pops its pretty little head.

 

Example 1:

Setting up event triggering on 2003

  • In Notepad type the line NotMayFault.exe /crash
  • Save the file as NMF.Bat
  • Open a Command Prompt
  • Run the command

eventtriggers /create /tr "Non Paged Pool Event" /eid 2019 /so SRV /tk \servershareNMF.bat

https://technet.microsoft.com/en-us/library/bb490901.aspx

 

Example 2:

Setting up event triggering on 2008

  • In Notepad type the line NotMayFault.exe /crash
  • Save the file as NMF.Bat
  • In Administrative Tools open Task Scheduler
  • Select Create Basic Task
  • On the Task Trigger page select "When a specific event is logged"
  • On the When a Specific Event Is Logged page select

Log: System

Source: SRV

Event ID: 2019

  • Acton page select "Start a program"
  • Select the location of the NMF.Bat file.

 

  • No registry changes are needed to get the server to crash.
  • This is used when we want to capture a dump as soon as a specific event is triggered
  • In general this should work for any OS.
  • The dump that is created is a 0xD1

 

That special case: Event 333

The server can be