IMPORTANT ANNOUNCEMENT FOR OUR READERS!
AskPFEPlat is in the process of a transformation to the new Core Infrastructure and Security TechCommunity, and will be moving by the end of March 2019 to our new home at https://aka.ms/CISTechComm (hosted at https://techcommunity.microsoft.com). Please bear with us while we are still under construction!
We will continue bringing you the same great content, from the same great contributors, on our new platform. Until then, you can access our new content on either https://aka.ms/askpfeplat as you do today, or at our new site https://aka.ms/CISTechComm. Please feel free to update your bookmarks accordingly!
Why are we doing this? Simple really; we are looking to expand our team internally in order to provide you even more great content, as well as take on a more proactive role in the future with our readers (more to come on that later)! Since our team encompasses many more roles than Premier Field Engineers these days, we felt it was also time we reflected that initial expansion.
If you have never visited the TechCommunity site, it can be found at https://techcommunity.microsoft.com. On the TechCommunity site, you will find numerous technical communities across many topics, which include discussion areas, along with blog content.
NOTE: In addition to the AskPFEPlat-to-Core Infrastructure and Security transformation, Premier Field Engineers from all technology areas will be working together to expand the TechCommunity site even further, joining together in the technology agnostic Premier Field Engineering TechCommunity (along with Core Infrastructure and Security), which can be found at https://aka.ms/PFETechComm!
As always, thank you for continuing to read the Core Infrastructure and Security (AskPFEPlat) blog, and we look forward to providing you more great content well into the future!
Matthew Reynolds here. My job is to make Windows sing (figuratively) in large enterprises.
If you have a machine which freezes you may need to generate a memory dump in order to find the cause. If you can generate the memory dump before calling Microsoft support you might speed up your diagnosis.
Use this technique if…
· The machine becomes unresponsive (but doesn’t crash to a blue screen) such that you cannot use other diagnostic tools
· The problem is likely to happen again in the future so you have a chance to configure the machine for next time
If you are thinking to yourself now, “what about live remote kernel debug?”, or “what about subtle differences between binary versions”, or “page file sizes are a many-nuanced topic” you are not wrong—you are just reading the wrong post. Exhaustive documentation exists at https://support.microsoft.com/en-us/kb/969028 and linked friends. These cover many more options, edge cases, virtualization and so on. I am writing this post because I recently found that my customers and I needed a quick “try this first” reference for ordinary PCs and servers (https://youtu.be/pjvQFtlNQ-M).
Step 1: Configure the Automatic (or Kernel) memory dump setting and page file
Of the various memory dump styles “Kernel” is often the best balance between size and usefulness.
Starting with Windows 8 / Server 2012 the “Automatic” option is a great way to get a Kernel memory dump. The automatic option is described here. http://blogs.technet.com/b/askcore/archive/2012/09/12/windows-8-and-windows-server-2012-automatic-memory-dump.aspx. Essentially you just choose the Automatic options for both memory dump configuration and page file size.
For Windows 7 / Server 2008 R2 use “Kernel” option instead with either system managed page file size or page file size > size of RAM.
Other dump modes such as Mini or Full might be used in consultation with a support engineer.
Step 2: Trigger the crash dump
Option A – NMICrashDump (good for remotely managed server class hardware)
Some server hardware provides the ability to trigger a crash (to get a memory dump) using a hardware interrupt. Typically this would be triggered using a hardware level remote management interface.
This approach is described here: https://support.microsoft.com/en-us/kb/927069.
Essentially you set the NMICrashDump registry value and then use the hardware specific remote management interface to trigger the crash.
Option B – CrashOnCtrlScroll (good for laptops and PC / workgroup-server class hardware)
“CrashOnCtrlScroll” (https://msdn.microsoft.com/en-us/library/windows/hardware/ff545499(v=vs.85).aspx) is a technique where the keyboard driver and kernel conspire to crash the machine (to get a memory dump) when a magic key sequence is detected. This is like a Windows Internals version of up, up, down, down, left, right, left, right, B, A… (http://en.wikipedia.org/wiki/Konami_Code).
Some keyboards and KVMs prevent the default Control + Scroll Lock + Scroll Lock sequence from working. Where the heck is Scroll Lock on my tiny tablet keyboard?
Fortunately you can change the magic keys. The CrashOnCtrlScroll article linked above alludes to this but leaves much of the implementation to the reader’s imagination. I typically start with examples that others have figured out like http://random-tutorials.blogspot.com/2012/08/manual-crash-dumps-on-windows.html which looks as follows in my registry. Be careful. Control + D + D as configured here is much more likely to be hit accidentally than Control + Scroll Lock + Scroll Lock
Step 3: Retrieve the file and get it to an expert for analysis
Copy or move the memory dump file (located by default at %SystemRoot%\memory.dmp) as needed. If the original hang was blocking boot or logon you may have to use an alternative boot path such as Safe Mode to get there. In my world the target audience for the memory dump is usually an escalation level expert deep inside Microsoft support: https://support.microsoft.com.
In case you decide to have a go at debugging it using windbg.exe or other tools (https://support.microsoft.com/en-us/kb/315263) keep in mind that the cause of your crash is already known. You triggered it manually. I stress this because many debugging tools or guides (e.g., !analyze) assume that you are trying to learn the cause of the crash and will simply report that the crash was triggered by whichever method you used.
Instead your goal is to use the memory dump to find the cause of the unresponsiveness which began prior to the crash. This is going to involve looking for locks, IRPs, critical sections, hung threads, etc. If only there were a cheat code…
Up, up, down, down, left, right, left, right, B, A (and call us)!
-Matthew “Glamour Shots” Reynolds