DiskSpd, PowerShell and storage performance: measuring IOPs, throughput and latency for both local disks and SMB file shares

1. Introduction

 

I have been doing storage-related demos and publishing blogs with some storage performance numbers for a while, and I commonly get questions such as “How do you run these tests?” or “What tools do you use to generate IOs for your demos?”. While it’s always best to use a real workload to test storage, sometimes that is not convenient. In the past, I frequently used and recommended a free tool from Microsoft to simulate IOs called SQLIO. However, there is a better tool that was recently released by Microsoft called DiskSpd. This is a flexible tool that can simulate many different types of workloads. And you can apply it to several configurations, from a physical host or virtual machine, using all kinds of storage, including local disks, LUNs on a SAN, Storage Spaces or SMB file shares.

2. Download the tool

 

To get started, you need to download and install the DiskSpd. You can get the tool from https://aka.ms/DiskSpd. It comes in the form of a ZIP file that you can open and copy local folder. There are actually 3 subfolders with different versions of the tool included in the ZIP file: amd64fre (for 64-bit systems), x86fre (for 32-bit systems) and armfre (for ARM systems). This allows you to run it in pretty much every Windows version, client or server.

In the end, you really only need one of the versions of DiskSpd.EXE files included in the ZIP (the one that best fits your platform). If you’re using a recent version of Windows Server, you probably want the version in the amd64fre folder. In this blog post, I assume that you copied the correct version of DiskSpd.EXE to the C:DiskSpd local folder.

If you're a developer, you might also want to take a look at the source code for DiskSpd. You can find that at https://github.com/microsoft/diskspd.

3. Run the tool

 

When you’re ready to start running DiskSpd, you want to make sure there’s nothing else running on the computer. Other running process can interfere with your results by putting additional load on the CPU, network or storage. If the disk you are using is shared in any way (like a LUN on a SAN), you want to make sure that nothing else is competing with your testing. If you’re using any form of IP storage (iSCSI LUN, SMB file share), you want to make sure that you’re not running on a network congested with other kinds of traffic.

WARNING: You could be generating a whole lot of disk IO, network traffic and/or CPU load when you run DiskSpd. If you’re in a shared environment, you might want to talk to your administrator and ask permission. This could generate a whole lot of load and disturb anyone else using other VMs in the same host, other LUNs on the same SAN or other traffic on the same network.

WARNING: If you use DiskSpd to write data to a physical disk, you might destroy the data on that disk. DiskSpd does not ask for confirmation. It assumes you know what you are doing. Be careful when using physical disks (as opposed to files) with DiskSpd.

NOTE: You should run DiskSpd from an elevated command prompt. This will make sure file creation is fast. Otherwise, DiskSpd will fall back to a slower method of creating files. In the example below, when you're using a 1TB file, that might take a long time.

From an old command prompt or a PowerShell prompt, issue a single command line to start getting some performance results. Here is your first example using 8 threads of execution, each generating 8 outstanding random 8KB unbuffered read IOs:

PS C:DiskSpd> C:DiskSpddiskspd.exe -c1000G -d10 -r -w0 -t8 -o8 -b8K -h -L X:testfile.dat

Command Line: C:DiskSpddiskspd.exe -c1000G -d10 -r -w0 -t8 -o8 -b8K -h -L X:testfile.dat

Input parameters:

        timespan: 1
-------------
duration: 10s
warm up time: 5s
cool down time: 0s
measuring latency
random seed: 0
path: 'X:testfile.dat'
think time: 0ms
burst size: 0
software and hardware cache disabled
performing read test
block size: 8192
using random I/O (alignment: 8192)
number of outstanding I/O operations: 8
stride size: 8192
thread stride size: 0
threads per file: 8
using I/O Completion Ports
IO priority: normal

Results for timespan 1:
*******************************************************************************

actual test time: 10.01s
thread count: 8
proc count: 4

CPU | Usage | User | Kernel | Idle
-------------------------------------------
0| 5.31%| 0.16%| 5.15%| 94.76%
1| 1.87%| 0.47%| 1.40%| 98.19%
2| 1.25%| 0.16%| 1.09%| 98.82%
3| 2.97%| 0.47%| 2.50%| 97.10%
-------------------------------------------
avg.| 2.85%| 0.31%| 2.54%| 97.22%

Total IO
thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
-----------------------------------------------------------------------------------------------------
0 | 20480000 | 2500 | 1.95 | 249.77 | 32.502 | 55.200 | X:testfile.dat (1000GB)
1 | 20635648 | 2519 | 1.97 | 251.67 | 32.146 | 54.405 | X:testfile.dat (1000GB)
2 | 21094400 | 2575 | 2.01 | 257.26 | 31.412 | 53.410 | X:testfile.dat (1000GB)
3 | 20553728 | 2509 | 1.96 | 250.67 | 32.343 | 56.548 | X:testfile.dat (1000GB)
4 | 20365312 | 2486 | 1.94 | 248.37 | 32.599 | 54.448 | X:testfile.dat (1000GB)
5 | 20160512 | 2461 | 1.92 | 245.87 | 32.982 | 54.838 | X:testfile.dat (1000GB)
6 | 19972096 | 2438 | 1.90 | 243.58 | 33.293 | 55.178 | X:testfile.dat (1000GB)
7 | 19578880 | 2390 | 1.87 | 238.78 | 33.848 | 58.472 | X:testfile.dat (1000GB)
-----------------------------------------------------------------------------------------------------
total: 162840576 | 19878 | 15.52 | 1985.97 | 32.626 | 55.312

Read IO
thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
-----------------------------------------------------------------------------------------------------
0 | 20480000 | 2500 | 1.95 | 249.77 | 32.502 | 55.200 | X:testfile.dat (1000GB)
1 | 20635648 | 2519 | 1.97 | 251.67 | 32.146 | 54.405 | X:testfile.dat (1000GB)
2 | 21094400 | 2575 | 2.01 | 257.26 | 31.412 | 53.410 | X:testfile.dat (1000GB)
3 | 20553728 | 2509 | 1.96 | 250.67 | 32.343 | 56.548 | X:testfile.dat (1000GB)
4 | 20365312 | 2486 | 1.94 | 248.37 | 32.599 | 54.448 | X:testfile.dat (1000GB)
5 | 20160512 | 2461 | 1.92 | 245.87 | 32.982 | 54.838 | X:testfile.dat (1000GB)
6 | 19972096 | 2438 | 1.90 | 243.58 | 33.293 | 55.178 | X:testfile.dat (1000GB)
7 | 19578880 | 2390 | 1.87 | 238.78 | 33.848 | 58.472 | X:testfile.dat (1000GB)
-----------------------------------------------------------------------------------------------------
total: 162840576 | 19878 | 15.52 | 1985.97 | 32.626 | 55.312

Write IO
thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
-----------------------------------------------------------------------------------------------------
0 | 0 | 0 | 0.00 | 0.00 | 0.000 | N/A | X:testfile.dat (1000GB)
1 | 0 | 0 | 0.00 | 0.00 | 0.000 | N/A | X:testfile.dat (1000GB)
2 | 0 | 0 | 0.00 | 0.00 | 0.000 | N/A | X:testfile.dat (1000GB)
3 | 0 | 0 | 0.00 | 0.00 | 0.000 | N/A | X:testfile.dat (1000GB)
4 | 0 | 0 | 0.00 | 0.00 | 0.000 | N/A | X:testfile.dat (1000GB)
5 | 0 | 0 | 0.00 | 0.00 | 0.000 | N/A | X:testfile.dat (1000GB)
6 | 0 | 0 | 0.00 | 0.00 | 0.000 | N/A | X:testfile.dat (1000GB)
7 | 0 | 0 | 0.00 | 0.00 | 0.000 | N/A | X:testfile.dat (1000GB)
-----------------------------------------------------------------------------------------------------
total: 0 | 0 | 0.00 | 0.00 | 0.000 | N/A

  %-ile | Read (ms) | Write (ms) | Total (ms)
----------------------------------------------
min | 3.360 | N/A | 3.360
25th | 5.031 | N/A | 5.031
50th | 8.309 | N/A | 8.309
75th | 12.630 | N/A | 12.630
90th | 148.845 | N/A | 148.845
95th | 160.892 | N/A | 160.892
99th | 172.259 | N/A | 172.259
3-nines | 254.020 | N/A | 254.020
4-nines | 613.602 | N/A | 613.602
5-nines | 823.760 | N/A | 823.760
6-nines | 823.760 | N/A | 823.760
7-nines | 823.760 | N/A | 823.760
8-nines | 823.760 | N/A | 823.760
max | 823.760 | N/A | 823.760

NOTE: The -w0 is the default, so you could skip it. I'm keeping it here to be explicit about the fact we're doing all reads.

For this specific disk, I am getting 1,985 IOPS, 15.52 MB/sec of average throughput and 32.626 milliseconds of average latency. I’m getting all that information from the blue line above.

That average latency looks high for small IOs (even though this is coming from a set of HDDs), but we’ll examine that later.

Now, let’s try now another command using sequential 512KB reads on that same file. I’ll use 2 threads with 8 outstanding IOs per thread this time:

PS C:DiskSpd> C:DiskSpddiskspd.exe -c1000G -d10 -w0 -t2 -o8 -b512K -h -L X:testfile.dat

Command Line: C:DiskSpddiskspd.exe -c1000G -d10 -w0 -t2 -o8 -b512K -h -L X:testfile.dat

Input parameters:

        timespan: 1
-------------
duration: 10s
warm up time: 5s
cool down time: 0s
measuring latency
random seed: 0
path: 'X:testfile.dat'
think time: 0ms
burst size: 0
software and hardware cache disabled
performing read test
block size: 524288
number of outstanding I/O operations: 8
stride size: 524288
thread stride size: 0
threads per file: 2
using I/O Completion Ports
IO priority: normal

Results for timespan 1:
*******************************************************************************

actual test time: 10.00s
thread count: 2
proc count: 4

CPU | Usage | User | Kernel | Idle
-------------------------------------------
0| 4.53%| 0.31%| 4.22%| 95.44%
1| 1.25%| 0.16%| 1.09%| 98.72%
2| 0.00%| 0.00%| 0.00%| 99.97%
3| 0.00%| 0.00%| 0.00%| 99.97%
-------------------------------------------
avg.| 1.44%| 0.12%| 1.33%| 98.52%

Total IO
thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
-----------------------------------------------------------------------------------------------------
0 | 886046720 | 1690 | 84.47 | 168.95 | 46.749 | 47.545 | X:testfile.dat (1000GB)
1 | 851443712 | 1624 | 81.17 | 162.35 | 49.497 | 54.084 | X:testfile.dat (1000GB)
-----------------------------------------------------------------------------------------------------
total: 1737490432 | 3314 | 165.65 | 331.29 | 48.095 | 50.873

Read IO
thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
-----------------------------------------------------------------------------------------------------
0 | 886046720 | 1690 | 84.47 | 168.95 | 46.749 | 47.545 | X:testfile.dat (1000GB)
1 | 851443712 | 1624 | 81.17 | 162.35 | 49.497 | 54.084 | X:testfile.dat (1000GB)
-----------------------------------------------------------------------------------------------------
total: 1737490432 | 3314 | 165.65 | 331.29 | 48.095 | 50.873

Write IO
thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
-----------------------------------------------------------------------------------------------------
0 | 0 | 0 | 0.00 | 0.00 | 0.000 | N/A | X:testfile.dat (1000GB)
1 | 0 | 0 | 0.00 | 0.00 | 0.000 | N/A | X:testfile.dat (1000GB)
-----------------------------------------------------------------------------------------------------
total: 0 | 0 | 0.00 | 0.00 | 0.000 | N/A

  %-ile | Read (ms) | Write (ms) | Total (ms)
----------------------------------------------
min | 9.406 | N/A | 9.406
25th | 31.087 | N/A | 31.087
50th | 38.397 | N/A | 38.397
75th | 47.216 | N/A | 47.216
90th | 64.783 | N/A | 64.783
95th | 90.786 | N/A | 90.786
99th | 356.669 | N/A | 356.669
3-nines | 452.198 | N/A | 452.198
4-nines | 686.307 | N/A | 686.307
5-nines | 686.307 | N/A | 686.307
6-nines | 686.307 | N/A | 686.307
7-nines | 686.307 | N/A | 686.307
8-nines | 686.307 | N/A | 686.307
max | 686.307 | N/A | 686.307

With that configuration and parameters, I got about 165.65 MB/sec of throughput with an average latency of 48.095 milliseconds per IO. Again, that latency sounds high even for 512KB IOs and we’ll dive into that topic later on.

5. Understand the parameters used

Now let’s inspect the parameters on those DiskSpd command lines. I know it’s a bit overwhelming at first, but you will get used to it. And keep in mind that, for DiskSpd parameters, lowercase and uppercase mean different things, so be very careful.

Here is the explanation for the parameters used above:

PS C:> C:DiskSpddiskspd.exe -c1G -d10 -r -w0 -t8 -o8 -b8K -h -L X:testfile.dat

Parameter Description Notes
-c Size of file used. Specify the number of bytes or use suffixes like K, M or G (KB, MB, or GB). You should use a large size (all of the disk) for HDDs, since small files will show unrealistically high performance (short stroking).
-d The duration of the test, in seconds. You can use 10 seconds for a quick test. For any serious work, use at least 60 seconds.
-w Percentage of writes. 0 means all reads, 100 means all writes, 30 means 30% w