Back in “the old days”, you could use a ball-point pen to break into the debugger. No, I haven’t stayed too long at the fair – you could use the tip of the ball-point pen to short the nearest pair of pins to create a hardware crash dump. Obviously this isn’t recommended (or supported!), but as the old adage goes, necessity is the mother of invention. Since then, the introduction of hardware dump switches (which are usually referred to as NMI switches) eliminated the need to dig out your trusty pen and start poking around on your systems.
On the Performance team, we request server dumps routinely to troubleshoot a variety of issues. Normally we use the CrashOnCtrlScroll key combination to capture a manual memory dump. However, there have been instances where the server does not respond to this key combination. In these cases, the NMI switch can be your best friend …
So what exactly is NMI? If you recall our post on IRQL, the Interrupt Request Level defines the hardware priority at which a processor operates. These interrupt request levels allow processes to mask (or block) interrupt requests to have the processor perform a task. Thus, NMI – the Non-Maskable Interrupt - is basically “God Mode” for Interrupt Requests, and by extension the processor. NMI requests cannot be blocked and are reserved for very high priority tasks. These tasks are invoked whenever there is a serious system error that requires immediate attention to prevent data loss or data corruption. Some examples that you might have seen in the past include memory parity errors, bus timeouts whereby an add-on card may be defective and has stopped responding, or in some very rare cases a software program has generated an NMI. The NMI signal tells the processor to drop whatever it was doing and satisfy the NMI request. The NMI request sent by the NMI switch causes the server to bugcheck. The resultant bugcheck may be one of the following:
- STOP 0x00000080 (NMI_HARDWARE_FAILURE)
- STOP 0x000000C2 (BAD_POOL_CALLER)
- STOP 0x000000E2 (MANUALLY_INITIATED_CRASH)
As you can see, in instances where we may be in such a severe hang that we cannot get keyboard inputs to function, the NMI switch can be extremely useful in capturing troubleshooting data. A word of warning though – and we encounter this scenario more often than you might think. If CrashOnCtrlScroll doesn’t work, and neither does your NMI switch, you should ensure that you have the appropriate registry modification on your system:
HKLM\System\CurrentControlSet\Control\CrashControl Value Name: NMICrashDump Value Type: REG_DWORD Value Data: 1
If everything is configured correctly, and you can’t generate a dump using the NMI switch, then it’s probably time to start calling your hardware vendor and having them run a thorough check on your system.
Until next time …
|Share this post :|