Windows Storage Server 2012 R2 – New Storage Spaces Functionality and Improved Data Deduplication

Article
12/18/2013

Hi Folks –

If you’ve been following my blog, you know that I’m working my way through the list of my top 10 new features in Windows Storage Server 2012 R2. In this post, I’ll examine how we made it even easier to cost-effectively scale your storage capacity. This is enabled through enhancements in two key areas:

Storage Spaces
Data Deduplication

Let’s take a closer look at each of these technologies and the new enhancements in the latest release of Windows Server 2012 R2.

Storage Spaces

Storage Spaces, a technology introduced in Windows 8, Windows Server 2012, and Windows Storage Server 2012, enables you to virtualize storage by grouping industry-standard disks into storage pools, and then create virtual disks (called storage spaces) from the available capacity in those storage pools. Storage Spaces is manageable through the Windows Storage Management API and Windows PowerShell, and through the File and Storage Services user interface in Server Manager. Storage Spaces is completely integrated with Failover Clustering for high availability, and it is integrated with CSV for scale-out deployments.

Enhancements to Storage Spaces in Windows Server 2012 R2 (and Windows Storage Server 2012 R2) include:

Storage tiers, which enables the creation of virtual disks comprised of two tiers of storage: a solid state disk (SSD) tier for frequently accessed data, and a HDD tier for less-frequently accessed data. Storage Spaces transparently moves data at a sub-file level between the two tiers based on how frequently data is accessed. As a result, storage tiers can dramatically increase performance for the most used (“hot”) data by moving it to SSD storage, without sacrificing the ability to store large quantities of data on inexpensive HDDs.
Write-back cache, which improves performance by buffering small random writes (which often dominate common enterprise computing workloads) to existing SSDs in a storage pool before writing them to traditional HDDs.
Parity space support for failover clusters, which enables you to create parity spaces on failover clusters. (Parity spaces are recommended for sequential writing operations and archival data.)
Dual parity, which stores three copies of your data on a parity space, helping protect you from two simultaneous disk failures.
Faster storage space rebuilds, which reduces the time it takes to rebuild a storage space after a disk failure by using spare capacity in the pool instead of a single hot spare.

I predict that Storage Spaces will be very popular. As I mentioned in a previous blog, it’s one component of a great recipe for cost-effective, highly available storage. When you combine Storage Spaces with Clustering and certified JBODs, you get a dynamic, self-healing data storage solution that’s easy to deploy and manage.

My ideal scenario is to use a Windows Storage Server cluster-in-a-box to host the disks, and to make that storage accessible to Hyper-V hosts and SQL Server using the SMB 3.0 protocol. This is compelling because it’s super-easy to setup and leverages my existing investments in IP and Ethernet-based network infrastructure.

More information on Storage Spaces can be found here.

A list of frequently asked questions (FAQ) on Storage Spaces can be found here.

You can find a list of certified JBODs under the Storage Spaces Category in the Windows Server Catalog.

Data Deduplication

Data Deduplication, which was introduced in Windows Server 2012 (and Windows Storage Server 2012), has quickly become one of its leading features—and a “standard consideration” when deploying file servers. After all, who doesn’t want the option to store more raw data in the same physical space by simply flipping a switch?

In various deployments, we saw decreases in required disk space of up to 90 percent. Some sample results for specific workloads include:

A 30-50 percent increase in storage efficiency when deduplicating home directory shares.
Up to a 50 percent increase in storage efficiency when deduplicating group file/collaboration shares.
Up to a 70 percent increase in storage efficiency when deduplicating software deployment shares.
Up to a 90 percent increase in storage efficiency when deduplicating VHD libraries.

In Windows Server 2012 R2 and Windows Storage Server 2012 R2, we made Data Deduplication even more powerful and useful by supporting a key new scenario:

Deduplication of live VHDs for Virtual Desktop Infrastructure (VDI) workloads, which enables the use of Data Deduplication to optimize virtual disks for running VDI workloads—provided that the storage and compute nodes for the VDI infrastructure are connected remotely via the SMB protocol.

To enable support for VDI workloads with adequate performance and availability, we made several lower-level improvements related to Data Deduplication:

Support for Cluster Shared Volumes (CSVs), as required to support the Scale Out File Server (SoFS) architecture recommended for highly available storage of server application data—including VDI.
Deduplication of open files — In Windows Server 2012, Data Deduplication focused on files at rest and would skip any file that was in active use. To enable deduplication of live VDI VMs, Data Deduplication in Windows Server 2012 R2 now supports open files, with negligible impact to read/write performance.
Faster and more efficient deduplication, which enables the storage server to keep up with the I/O patterns for VDI workloads, as compared to traditional file share workloads. Informal internal testing at Microsoft shows that deduplication is now 33-50 percent faster, depending on the specific I/O patterns.
Faster write performance for deduplicated files, which improves performance across all workloads.
Faster and more efficient read performance for VDI files, as enabled by forced caching for Hyper-V I/O. For VDI workloads, this means that the use of data deduplication (and its associated chunk cache) can actually deliver better read performance than for non-deduplicated data.

When combined, these improvements to Data Deduplication enable massive storage cost savings for VDI deployments and makes it possible to leverage the superior I/O performance of solid-state drives (SSDs) without investing in massive storage arrays.

For more information on these enhancements to Data Deduplication and how to deploy it for VDI storage, see Matthias Wollnik’s blog articles here and here.

Final Thoughts

There is a great debate going on about the value and utility of Storage Spaces, as compared to traditional RAID systems. When you use Storage Spaces in high-throughput configurations, you will want to use mirrored configurations and fast SSD drives to absorb random writes. The cost of mirroring the drives might be more expensive than using a RAID adapter, but will probably be less than buying HBAs for each cluster node and a self-contained external RAID system.

When you use Windows Server 2012 R2 and implement Storage Spaces on an attached JBOD, you get great cost-efficiency. And when you turn on Data Deduplication, your data volumes will typically be reduced by 50 percent or more, which will mitigate the cost of additional drives for mirroring. Now you can get low cost, high density and high-performance at the same time—a great combination!

Cheers,
Scott M. Johnson
Senior Program Manager
Windows Storage Server
@supersquatchy

Windows Storage Server 2012 R2 – New Storage Spaces Functionality and Improved Data Deduplication

Storage Spaces

Data Deduplication

Final Thoughts

Additional resources