Just a quick note out there on a specific issue we’ve been seeing lately. It is the really the only issue with Windows 2012 server hanging or becoming unresponsive that we’ve had up almost a year into the release of this OS.
On Windows 2012 running on HP servers that use the October 15th 2012 version of HpCISSs2.sys, those servers will run into an issue were we have an IO stall accessing the disk. This stall happens when we hit the HpCISSs2.sys and will never recover. The first time it happens the servers don’t actually hang but that is when the traffic jam starts to backup. Meaning – someone can have a car accident block the highway 5 miles down the road. You wouldn’t stop moving right away but eventually you’re going to be stopped in the resulting line of traffic.
Some of the symptoms you may see is that the server will show you a grey screen instead of the logon option. You may not be able to RDP to the server but if you test ports such as 3389 135 445 with port query they will be open. Remote WMI queries may fail. Ping will work. File shares will not be accessible.
For those familiar with the Windows debugger the driver version info is:
14: kd> lmvm
start end module name
Loaded symbol image file: HpCISSs2.sys
Image name: HpCISSs2.sys
Timestamp: Mon Oct 15 14:09:41 2012 (507C5165)
Translations: 0000.04b0 0000.04e4 0409.04b0 0409.04e4
From Microsoft diagnostics reports (MPS or MSDT) we see the file as:
STORAGE (SCSI) HPCISSS2.SYS Hewlett-Packard Company 126.96.36.199 10/15/2012 153 KB (156,992 bytes) Smart Array P410i Controller
Although MSinfo32 may show a different date, the version number in these cases has been 188.8.131.52:
Driver c:\windows\system32\drivers\hpcisss2.sys (184.108.40.206, 153.31 KB (156,992 bytes), 7/2/2013 12:28 AM)
Customers with this issue have followed what HP has documented here for a separate issue involving HpCISSs2.sys on 2012.
If you have that version of the driver, are running Windows 2012 and are running into this behavior please contact HP and open a ticket with them to investigate this further.
To get conclusive data that confirms this is the issue, a Kernel Memory dump is the best route. Since Servers are typically setup for a Kernel dump by default the only change that would be required is to setup the system to generate that dump on an NMI signal:
Not having to get a Full memory dump and only needing a Kernel dump greatly reduces the resulting size of the dump file, which is becoming problematic when generating dumps on systems with a large amount of RAM.