Storage Spaces Direct throughput with iWARP

Hello, Claus here again. It has been a while since I last posted here and a few things have changed since last time. Windows Server has been moved into the Windows and Devices Group, we have moved to a new building with a better café, but a worse view 😊. On a personal note, I can be seen waddling the hallways as I have had foot surgery.

At Microsoft Ignite 2016 I did a demo at the 28-minute mark as part of the Meet Windows Server 2016 and System Center 2016 session. I showed how Storage Spaces Direct can deliver massive amounts of IOPS to many virtual machines with various storage QoS settings. I encourage you to watch it, if you haven’t already, or go watch it again 😊. In the demo, we used a 16-node cluster connected over iWARP using the 40GbE Chelsio iWARP T580CR adapters, showing 6M+ read IOPS. Since then, Chelsio has released their 100GbE T6 NIC adapter, and we wanted to take a peek at what kind of network throughput would be possible with this new adapter.

We used the following hardware configuration:

  • 4 nodes of Dell R730xd
    • 2x E5-2660v3 2.6Ghz 10c/20t
    • 256GiB DDR4 2133Mhz (16 16GiB DIMM)
    • 2x Chelsio T6 100Gb NIC (PCIe 3.0 x16), single port connected/each, QSFP28 passive copper cabling
    • Performance Power Plan
    • Storage:
      • 4x 3.2TB NVME Samsung PM1725 (PCIe 3.0 x8)
      • 4x SSD + 12x HDD (not in use: all load from Samsung PM1725)
    • Windows Server 2016 + Storage Spaces Direct
      • Cache: Samsung PM1725
      • Capacity: SSD + HDD (not in use: all load from cache)
      • 4x 2TB 3-way mirrored virtual disks, one per cluster node
      • 20 Azure A1-sized VMs (1 VCPU, 1.75GiB RAM) per node
      • OS High Performance Power Plan
    • Load:
      • DISKSPD workload generator
      • VM Fleet workload orchestrator
      • 80 virtual machines with 16GiB file in VHDX
      • 512KiB 100% random read at a queue depth of 3 per VM

We did not configure DCB (PFC) in our deployment, since it is not required in iWARP configurations.

Below is a screenshot from the VMFleet Watch-Cluster window, which reports IOPS, bandwidth and latency.


As you can see the aggregated bandwidth exceeded 83GB/s, which is very impressive. Each VM realized more than 1GB/s of throughput, and notice the average read latency is <1.5ms.

Let me know what you think.

Until next time