Storage Spaces Direct in Technical Preview 4


Hello, Claus here again. This time we are going to take a look at a couple of the key enhancements to Storage Spaces Direct that are coming alive in Windows Server Technical Preview 4 namely Multi-Resilient Virtual Disks and ReFS Real-Time Tiering. These two combined solves two key issues that we have in Windows Server 2012/R2 Storage Spaces. The first issue is that parity spaces only works well for archival/backup workloads – it does not perform well enough for workloads such as virtual machines. The second issue is that the tiering mechanism is an ‘after the fact tiering’ in that the system collects information about hot and cold user data, but only moves this user data in and out of the faster tier as a scheduled task using this historical information.

I suggest reading my blog post on Software Storage Bus and the Storage Bus Cache, which contains important information about how the Software Storage Bus and Storage Bus Cache works, both of which sits underneath the virtual disks and file systems.

Multi-Resilient Virtual Disks

A multi-resilient virtual disk is a virtual disk, which has one part that is a mirror and another part that is erasure coded (parity).


Figure 1 Virtual disk with both mirror and parity tier.

To arrive at this configuration, the administrator defines two tiers, just like in Windows Server 2012 R2, however this time the tiers are defined by their resiliency setting rather than the media type. Let’s take a look at a PowerShell example for a system with SATA SSD and SATA HDD (the Technical Preview 4 deployment guide also includes an example for an all-flash system with NVMe + SATA SSD):

 

# 1: Enable Storage Spaces Direct

Enable-ClusterS2D

# 2: Create storage pool

New-StoragePool -StorageSubSystemFriendlyName *cluster* -FriendlyName S2D -ProvisioningTypeDefault Fixed -PhysicalDisk (Get-PhysicalDisk | ? CanPool -eq $true)

# The below step is not needed in a flat (single tier) storage configuration

Get-StoragePool S2D | Get-PhysicalDisk |? MediaType -eq SSD | Set-PhysicalDisk -Usage Journal

# 3: Define Storage Tiers

$MT = New-StorageTier –StoragePoolFriendlyName S2D -FriendlyName MT -MediaType HDD -ResiliencySettingName Mirror -PhysicalDiskRedundancy 2

$PT = New-StorageTier –StoragePoolFriendlyName S2D -FriendlyName PT -MediaType HDD -ResiliencySettingName Parity -PhysicalDiskRedundancy 2

# 4: Create Virtual Disk

New-Volume –StoragePoolFriendlyName S2D -FriendlyName <VirtualDiskName> -FileSystem CSVFS_ReFS -StorageTiers $MT,$PT -StorageTierSizes 100GB,900GB

The first two steps enable Storage Spaces Direct and creates the Storage Pool. In the third step we define the two tiers. Notice that we use the “ResiliencySettingName” parameter in the definition of the tiers, where the MT tier has “ResiliencySettingName” set to “Mirror” and the PT tier has “ResiliencySettingName” set to “Parity”. When we subsequently create the virtual disk we specify the size of each tier, in this case 100GB of mirror and 900GB of parity, for a total virtual disk size of 1TB. ReFS uses this information to control its write and tiering behavior (which I will discuss in the next section).

The overall footprint of this virtual disk on the pool is 100GB * 3 (for three copy mirror) + 900GB *1.57 (for 4+3 erasure coding), which totals ~1.7TB. Compare this to the overall footprint of a similar sized 3-copy would have a footprint of 3TB.

Also notice that we specified “MediaType” as HDD for both both tiers. If you are used to Windows Server 2012 R2 you would think that this is an error – but it is actually on purpose. For all intents and purposes the “MediaType” is irrelevant as the SSD devices are already used by the Software Storage Bus and Storage Bus Cache as discussed in this blog post.

ReFS Real-Time Tiering

Now that we have created a multi-resilient virtual disk lets discuss how ReFS operates on this virtual disk. ReFS always writes into the mirror tier. If the write is an update to data sitting in the parity tier, then the new write still goes into the mirror tier and the old data in the parity tier is invalidated. This behavior ensures that writes are always written as a mirror operation which is the best performing, especially for random IO workloads like virtual machines and requires the least CPU resources.


Figure 2 ReFS write and data rotation

 

The write will actually land in the Storage Bus Cache below the file system and virtual disk. The beauty of this arrangement is that there is not a fixed relation between the mirror tier and the caching devices, so if you happen to define a virtual disk with a mirror tier that is much larger than the actual working set for that virtual disk you are not wasting valuable resources.

ReFS will rotate data from the mirror tier into the parity tier in larger sequential chunks as needed and perform the erasure coding computation upon data rotation. As the data rotation occurs in larger chunks it will skip the write-back cache and be written directly to the capacity devices, which is OK since its sequential IO with a lot less impact on especially rotational disks. Also, the larger writes overwrite entire parity stripes, eliminating the need to do read-modify-write cycles that smaller writes to a parity Space would otherwise incur.

Conclusion

So they say you cannot have your cake and eat it too, however in this case you can have capacity efficacy with multi-resilient virtual disks and good performance with ReFS real-time tiering. These features are introduced in Technical Preview 4 and as we expect to continue to improve performance as we move towards Windows Server 2016.

 

 


Comments (18)

  1. Ash says:

    On your mirrored tier PowerShell command example, I believe you left off the value for the PhysicalDiskRedundancy attribute.

    It should be set to 2 for a 3-way mirror.

    So the correct command should be:

    $MT = New-StorageTier –StoragePoolFriendlyName S2D -FriendlyName MT -MediaType HDD -ResiliencySettingName Mirror -PhysicalDiskRedundancy 2

  2. Jacob says:

    Not trying to sound harsh, but this really still seems to just be a lame workaround for the absolutely woeful performance of parity in Storage Spaces :.

  3. Michael says:

    Does this new tiering model work for traditional shared storage spaces or just storage spaces direct?

    Is it possible to have more than 2 tiers (say mirror and parity ssd tiers as well as mirror and parity hdd tiers)?

  4. Ash says:

    Claus,

    Attempting to create a multi-resilient VD using the steps defined above and am not having any success. 4 node cluster with 6 HDDs per node. S2D has been enabled and I see 24 physical disks from any node. FaultDomainAwareness is StorageScaleUnit for the pool.

    When I run the New-Volume command above I receive this error:

    New-Volume: Not Supported
    Extended Information: There are not enough eligible physical resources in the storage pool to create the specified virtual disk configuration.

    Recommended Actions:
    -Choose a combination of FaultDomainAwareness and NumberOfDataCopies (or PhysicalDiskRedundancy) supported by the storage pool.
    -Chose a value of for NumberOfColumns that is less than or equal to the number of physical disks in the storage fault domain selected for the virtual disk.

    Activity ID: {cafc433a-4c6a-4ace-91d5-4a0e0097516e

    I have tried the following solutions and received the same error as above:
    1. Creating the virtual disk using New-VirtualDisk instead of New-Volume cmdlet
    2. Manually specifying the column count to minimum value (1 for 3 way mirror, 7 for dual parity)
    3. Changing redundancy to two way mirror and single parity and using minimum column count (1 for mirror, 3 for parity)

  5. clausjor says:

    @Michael – Yes Multi-Resilient Virtual Disks and ReFS Real-Time Tiering also works in Shared Storage Spaces.

  6. John says:

    This seems very much like what Compellent have been doing for years. We have had to pay through the nose for it so good to see it become more available.

  7. Cloud-Ras says:

    Totally ReFS-noob questions 😉
    What volume size does ReFS support to?
    Is there anything to gain in speed vs. NTFS?

  8. Andrew A says:

    I’m running into some confusion around doing a multi-node S2D cluster. When I run new-storagepool as per your example, it tells me I don’t have enough physical disks available. I notice that in get-storagesubsystem I have a read-only value for PhysicalDisksPerStoragePoolMin
    of 3. Given that I have 3 nodes who each have a physical disk where "CanPool" is $true, I would have thought that I’d meet the requirement. Should new-storagepool span all 3 nodes and thus have enough disks available? I also notice that if I do a get-storagesubsystem
    on the "clustered windows storage on S2d-cluster" and pipe to get-physicaldisk, I don’t see any disks. However get-storagesubsystem "Windows Storage on Server" piped to get-physicaldisk shows me the disk I’m hoping to pool. Are my poolable disks in the wrong
    subsystem? Any way to move them?

  9. Adam E says:

    @Andrew
    S2D requires 4 nodes (servers). While I have not seen any official reason as to why, it is likely due to the erasure coding algorithm in use (parity). Technically the TP build will let you create a S2D mirror (not parity) with as little as two nodes, though
    this is not recommended. A three node minimum would make sense for mirrored spaces to me (so there is a quorum is a node fails), but they may keep the official minimum to 4 for consistency. Hopefully Claus will continue to shed light on the configuration scenarios.

    @Claus
    Any better info on how column count affects the real time tiering? How about performance metrics for different combinations of disk types (NVMe+HDD vs SSD + HDD, can all three be used?)? These would be helpful in hardware selection leading up to release.

  10. Deagle says:

    This also works on one node! I finally may have a replacement for ZFS.

    One question, how do you grow the hybrid virtual disk?

  11. Deagle says:

    Update:
    I used "Resize-StorageTier" on the parity tier and it grew the disk. But it looks like the volume is "fixed" provisioned and can’t be grown. Any workarounds?

  12. Thildemar says:

    @Claus
    Any guidance on the use of journal disks and caches with SSD/NVMe tiers? If you were to create a tier with mirrored SSD and mirrored HDD (say 100GB/900GB). Would you want/need journal disks? How about read/write cache?

    Also seeing the same issue as @Ash above when trying to create volumes in TP4 even with plenty of disks in Pool…

  13. clausjor says:

    @Tildemar

    We generally see the error on ‘not enough eligible disks’ in one of these conditions:
    1. There is insufficient capacity to create the the desired volume. The actual footprint on pool depends on the number of mirrored copies etc.
    2. You need at least four nodes to do parity (aka erasure coding)
    3. The HDD show up with MediaType as ‘Unspecified’. We have seen this with some HBA controllers and can typically be rectified with a firmware update. We have also seen this in VM setups. You can work around this by running the following command after creating
    the storage pool:

    Get-StorageSubSystem *cluster* | Get-PhysicalDisk | ? MediaType -eq Unspecified | Set-PhysicalDisk -MediaType HDD

  14. clausjor says:

    @ Adam E

    We are working through characterizing various hardware configurations and i hope that we will have a blog post on that at a later time.

  15. Michael Gray says:

    Please take a look at the guide here:
    https://technet.microsoft.com/en-us/library/dn789160.aspx for more information on resizing a storage tier, and getting a virtual disk and volume to expand into the newly-available space.

  16. BriKuz says:

    Multiple issues with only 7 of 8 HDDs being available in the storage subsystem pool. (all 4 caching ssds show up) is there a way to add another disk to a pool after running Enable-ClusterS2D?

    I have actually had this running three times already WITHOUT the cache drives… but no joy now.

  17. bviktor says:

    I’m sorry for being rude but this mirror-tier vs parity-tier thing really takes the cake. This IS the definition of polishing a turd. I can’t believe what I just read. Here, read this instead:

    http://vault-tec.info/post/131991367071/storage-spaces-vs-zfs

    and any one of you with a minimal amount of common sense will realize that parity is beyond all hope. You forget the fact that the parity RAID levels were fundamentally broken RIGHT FROM THE START. Then you added a BAD implementation of it and now you’re expecting
    good results. No, just no. Parity spaces is useless, even for backups. I’m serious, I’m using it daily with 12x8TB drives. It’s awful.

    ZFS was designed so well that it completely eliminates all the problems the original parity levels had. So seriously, why don’t you implement ZFS? Are you too proud for it, or what? Is that your answer to customers? There’s literally NO REASON not to use something
    that’s already solved these problems a DECADE ago. Heck, I don’t even care if you rewrite it from scratch and name it MSFS or whatever you want, just fix this already!

    In fact, after writing the blogpost I linked above, I was contacted by a Microsoft engineer and was told that he’s really interested and will let me know how to proceed further soonish. Not a single word ever since. It’s a shame, really.

    This is the single biggest MISSED OPPORTUNITY for Windows Server, really. It’d be THE choice for off-the-shelf storage solutions. You could have it ALL. Too bad you’re too stubborn to actually fix problems and instead give us solutions that are more complex
    to implement (=more prone to errors), more expensive AND slower. Ridiculous.

    If you actually had ZFS, I could build 100TB HA storage clusters under $20k. Instead, I’ll buy an Oracle ZFS Appliance for $100k. How much of a selling point this would be? I’m baffled that you really can’t see this. You have CIFS, you have AD integration,
    you have NFS, you have iSCSI, you have LACP, you have MPIO, you have clustering. You have ALL the required underlying parts in the world to provide enterprise storage, except this most BASIC thing, i.e. a usable parity level is missing.

    You couldn’t make this any more wrong.

    1. ARM says:

      @bviktor
      Have you actually tried the new multi-resilient virtual disk implementation in Storage Spaces Direct? In a 4 node configuration I am getting over 100K IOPs per node with SATA SSDs and SATA HDDs with multi resilient virtual disks (20% mirror tier, 80% parity tier).

      The issues with parity Storage Spaces are well documented, and why they are not recommended for VM workloads, but with this new implementation has demonstrated to me at least it is completely viable.