Server Hung/Becoming Unresponsive

A server hang is typically defined as a condition where a machine is non-responsive locally or over the network.

Hard Hang (Not necessary referring to hardware):

  • The server is not accessible using any remote functionalities – RDP, Citrix, etc…
  • Remote Console Access such as Drac, iLo, Rsa is possible but the Operating System is not responding to any command for example “Ctrl+Alt+Del”.
  • If the server is a Virtual Machine, the Hypervisor console doesn’t respond to CAD or the Hypervisor performance monitoring tools are not showing activity.
  • A test ping to the server will fail and we cannot access Administrative shares (\\ServerName\c$).

Soft Hang:

  • The server is not accessible using any remote functionalities – RDP, Citrix, etc…
  • You are able to send “Ctrl+Alt+Del” command into the console access but the Credentials Box/Winlogon GINA never comes up or is slowly coming up.
  • The Ping Test response are fine, dropping, high network latency is observed.
  • Accessing an administrative Share is not working/working /slow (\\ServerName\c$).

Reference: PRF: Server Hang (Pre-Windows Server 2008+)

If you are currently experiencing a hang and are considering opening a support incident with Microsoft, please prepare for the following: Troubleshooting Server hangs, memory leaks or resource depletions can be a very difficult and time consuming process of involving multiple attempts to collect the RIGHT data. Ensuring that you have collected the data and that the data is valid before engaging support will greatly reduce the time spent by both you and the support engineer when it comes to identifying the source of the issue.

  • If the machine is in state, we will ask you to configure data collection based on the steps provided in this blog
  • If the machine is not instate but you anticipate an occurrence. We will again ask you to configure data collection based on the steps in this blog
  • Once you complete the data collection steps defined in this blog and have a dump file you would like Microsoft Support to review, please verify the data based on the validation steps listed in this blog.

Note:  Server hang, memory leaks or resource depletions are often times related to 3rd party products. Support is able to, in some cases identify the third party but is unable to provide a resolution other than uninstalling the product or contacting the vendor. If you suspect that your issue might be related to a third party product, it is highly recommend that you contact them to ensure there are no known issues, that you have the latest updates and ensure availability for collaboration with Microsoft should the issue be identified with the product.

You must restart the system after any change in the registry or the Pagefile except for VMware Snapshots of Suspend State.

1. Pre-requisites for Memory Dump

Applies to Physical machines and Virtual Machines.

A- NMI or Keyboard Key Combination

The Non Makeable Interruption is not enabled by default on a Windows Operating System, create the following registry entry to enable it.

Location – HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl
Name – NMICrashDump
Type – REG_DWORD
Value – 1

KeyBoard PS/2
Location – HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\i8042prt\Parameters
Name – CrashOnCtrlScroll 
Type – REG_DWORD
Value – 1

USB Keyboard
Location – HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\ kbdhid\Parameters
Name – CrashOnCtrlScroll
Type – REG_DWORD
Value – 1

Note: We recommend setting the 2 last Registry Entries in case you choose the Keyboard initiated crash as the Host may not recognize USB or PS/2.

B- Type of Dump

The Full memory dump option is not available on Windows Operating System Prior Windows 8/2012. In any Windows Operating System (including Windows 8/2012-8.1/2012R2), you can select the Full Memory dump option by modifying the following registry key. If there isn’t enough space on Local Drives then you may set the value to 2 (Kernel Memory Dump), however the user mode portion (Application) side will not be captured:

Location – HKLM\SYSTEM\CurrentControlSet\Control\CrashControl
Name – CrashDumpEnabled
Value – 1

Note: On Windows 8/2012 and above you can change the option using the User Interface (See Point 1-C)

C- Pagefile

The Pagefile should the size of the Physical RAM+100MB. If the Pagefile is setup equal to the amount of RAM there is a good chance the dump file gets corrupt.

Setup the PageFile by going to Control Panel > System and Security > System. Click Advanced system settings. > Click on Settings under Performance > Click Advanced > Change. Select the Drive where you want the SwapFile/Pagefile to be hosted on, then, Select Custom Size. Once the size is correctly setup press the Set Button. Click OK and quit/exit the settings.

Example Bellow: Pagefile is set on E Drive with 196608 MB (192 GB) as an initial Size and 196708 Mb as a maximal size.

In case there isn’t enough space on the C drive to host both the Pagefile and the Memory dump (2 times the size of RAM per total), you may want to change the memory dump location. To setup a different memory dump location use the Interface or the registry:

User Interface:

To change these settings, go to Control Panel > System and Security > System. Click Advanced system settings. Under Startup and Recovery, click Settings.

Registry:

Location – HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl
Name – DumpFile
Value – Change %SystemRoot% to a local Drive letter

Important: In case the issue occur on a Physical Server, make sure the Automated System Recovery feature (if Applicable) is disabled in the BIOS. The recovery mechanism can restart the Server prematurely while the System is paging all the memory in the swapfile during the Crash/Bugcheck.

2. How to trigger the dump collection.

NMI Method

Remote Console Access such as Drac, iLo, Rsa, etc… allow the interruption using the Console Access through an Option often time under ‘Diagnostics’. If the Button is not available you may also have a Physical Button on the Hardware.

Keyboard (Ctrl+ScrLk) Method

The keyboard crash can be initiated by using the following hotkey sequence: Hold down the rightmost CTRL key, and press the SCROLL LOCK key twice.

Forcing a System Crash from the Keyboard

Virtual Machines on Hyper-V 2012 R2 Only

You can generate an NMI call using PowerShell directly on the host:

PS C:\Windows\system32> Debug-VM -Name "VM Name" -InjectNonMaskableInterrupt -ComputerName Hostname

Get a kernel dump of a 2012 R2 Hyper-V server with Powershell

Virtual Machines (Vmware)

Vmware snapshot or suspend states files are a copy of the Physical memory and are convertible into a Full Memory dump. If any issue is encountered while creating the snapshot or suspend state, then try the Steps above or contact Vmware support.

‘Vmss2core_win.exe’ tool will convert .vmsn/.vmss file extensions to memory dump: https://labs.vmware.com/flings/vmss2core

For VMs OS until Windows7/2008R2 use: vmss2core_win –W snapshot.vmsn/Suspend.vmss

For VMs OS Windows8/2012 and above use: Vmss2core_win –W8 snapshot.vmsn/Suspend.vmss

Note: copy the ‘Vmss2core’ tool on a Windows Operating System OS along with the Snapshot/Suspend state file.

3. Data Check/Sanity Check.

A- Checking the memory.dmp output file

Once the memory dump is generated, there is a chance the dump may be corrupted after reboot. In order to check if the dump is readable, a tool called ‘Dumpchk’ is available for download. This application will verify the data is readable. You can download Dumpchk from the Debugging tools for Windows from the Windows SDK:

Windows Software Development Kit (SDK) for Windows 8.1

Usage:

From an elevated Command prompt, change directory to the dumpchk folder location and run ‘Dumpcheck [Path to Dump]’

B- Data review and Analysis

  • Option 1

Compress the memory dump using either the Windows built-in Compression tool (Right Click > Send to > Compressed (Zipped) Folder) or any third party compression solution. If the File size after compression is lower than 8 GB then you can obtain a preliminary analysis using our Free Memory Dump Diagnostic Website:

Diagnostic Packages

Please note the report analysis is automated and may not be accurate. If you are not satisfied with the report then a support case will need to be opened.

  • Option 2

Reviewing the dump using the Windows Debugger included in the “Debugging Tools for Windows” (SDK):

Windows Software Development Kit (SDK) for Windows 8.1

Open the Debugger, go to File > Symbols File Path, input the path to the Symbol server and a local folder to save the symbols (Example Bellow).

SRV*your local symbol folder*http://msdl.microsoft.com/download/symbols

Where your local symbol folder is any drive or share that is used as a symbol destination.

Source:  Use the Microsoft Symbol Server to obtain debug symbol files

Important: The option 2 requires medium to advanced debugging skills.