Editor's note: Care of our friends over at http://mcpmag.com, Clint Huffman, a Microsoft Premier Field Engineer, provides his take on how you can talk intelligently to SAN administrators and vendors when disk performance issues surface. Here’s a teaser. Please make sure you check out the full article.
Traditionally, disk queue related performance counters such as "Avg. Disk Queue Length", "% Idle Time", and "% Disk Time" have been staples in the IT professionals tool belt. They have great value when analyzing single spindle disks, but are less effective when more spindles are added to a LUN or when spindles are shared between LUNs. For example, if "Avg. Disk Queue Length" is 2 and there are 10 spindles behind the LUN, then the LUN should have no problems with handling the load. This would be like have 10 check-out lines with only 2 people in line. Likewise, "% Idle Time" and "% Disk Time" are simply measures of how often the disk queue is completely empty or not empty respectively.
“Avg. Disk sec/Read” and “Avg. Disk sec/Write” are performance counters that measure the I/O request packet response times for read and write operations respectively. Response times are our best indicator of poor disk performance because the response times reliably increase when the disk subsystem is overwhelmed.
The following chart shows the access times in milliseconds and I/O’s per second for common hard drives. Access Times are the longest that any I/O request should take to respond on the given hardware. Hardware and software features of the disk subsystem such as short stroking and cache can dramatically increase these speeds and throughput. For example, my 5400 RPM USB disk drive can sustain 150 IOPS and stay under 5 ms average response times. The following table shows access times and IOPS of various hard drives.
|Device||IOPS*||Access Time (ms)*|
|3.5” floppy disk USB drive||8||120|
|5400 RPM hard disk||59||17|
|7200 RPM hard disk||77||13|
|10K RPM hard disk||125||8|
|15K RPM hard disk||143||7|
|solid state drive (SSD)||5000||0.2|
* Does not reflect actual products.
Based on the table above, we generically use sustained values of 15 ms or more as a warning threshold and sustained values of 25 ms as a critical threshold for disk response times using the “Avg. Disk sec/Read” and “Avg. Disk sec/Write” performance counters.
Note: All of the counters mentioned in this article are found on the LogicalDisk and PhyiscalDisk counter objects.
For complete details, check out the full article at: http://mcpmag.com/articles/2011/05/12/how-to-speak-san-ish.aspx