Hyper-V backup at Private Cloud scale

Update Rollup 3 (UR3) for DPM 2012 R2 brings key enhancements for VM backups that will ensure guaranteed backup SLAs and make backups much more efficient at scale for a virtualized deployment. The current update is aimed at minimizing the impact that backup has on the production storage infrastructure for private cloud deployments (1000s of VMs) on Windows Server 2012 R2.

We support both the configurations for Hyper-V deployments as listed below:

  • VMs hosted on a Hyper-V cluster with storage on SMB shares backed by a Scale-Out File Server cluster (Hyper-V over SOFS)
  • VMs hosted on a Hyper-V cluster with storage on Clustered Shared Volumes (Hyper-V over CSV)

Scale testing on SOFS

We have done extensive scale testing by taking continuous daily backups for 3 weeks using Virtualized DPM servers. The guest OS used for the protected VMs was Windows Server 2012 R2. The workload running inside each of the VMs was spread across multiple IO profiles (SQL OLTP, Exchange, File Server, Video Streaming, SQL Decision Support System). 

Here are the details of the Hyper-V over SOFS deployments:

image
Configuration Hyper-V over SOFS
# of Hyper-V Hosts 24
VM Config (RAM) 2-8 GB
VM Disk Size 120 GB (20 GB for OS + 100 GB for data)
Total # of VMs 1000
VM Churn per day 5%
SOFS Cluster Nodes 4
# of Virtual DPM Servers 8

We scale tested with each DPM server protecting between 50 to 250 VMs. DPM VMs were deployed in scale-out configuration to protect VMs from the same Hyper-V cluster nodes. We pivoted the results around the following criteria:  

  • Backup success rate per day – This signifies the percentage of VMs having successful backups in a single day.
  • Overall backup success rate – This signifies overall percentage of successful backups across all VMs for a 3 week duration.

We achieved more than 98% for both the metrics. It also implies that there were more than 20,000 jobs than ran successfully during this 3 weeks duration. The few errors that we encountered were due to known auto-recoverable failures – such as "Out of storage space" and "Retry-able VSS errors".

Stress testing on SOFS

We stress tested the Hyper-V backups on a slightly different scale (2 DPM servers protecting 500 VMs), taking 8 backups a day (every 3 hours) for more than a week. Here’s a 3 min video which shows the backup in action:

Scale testing on CSV

We did scale testing for Hyper-V over CSV and got similar results. 

image
Configuration Hyper-V over CSV
# of Hyper-V Hosts 12
VM Config (RAM) 1-8 GB
VM Disk Size 50 GB (20 GB for OS + 30 GB for data)
Total # of VMs 600
SAN Make/Model Dell Compellent SC8000
# of CSV 12
# of Virtual DPM Servers 2

 

DPM Deployment

The recommended virtualized deployment model is to provision backup storage through VHDs residing on Scale-out File Server (SOFS) shares.

A suggested DPM deployment configuration would look like the one mentioned below:

Virtual Processors 4
RAM 8 GB
NIC 10 Gbps
Storage 20 TB (1 TB X 20) Dynamic VHDs on SMB share

This configuration has a few advantages:

  1. Virtualized DPM setup allows easy scale-out
  2. SOFS cluster provides storage resiliency
  3. VHDs used as the backup storage provides flexibility for data growth

Additionally, we heard some customers required the flexibility to run backups during off-peak hours, so the concept of a Backup Window for VM data sources was introduced. Here is how you can set the backup window using PowerShell (ensure that the backup schedule aligns with the StartTime parameter used in Set-DPMBackupWindow):

Set-DPMBackupWindow -ProtectionGroup <ModifiablePGObject> -StartTime 23:00 -DurationInHours 6

Set-DPMProtectionGroup <ModifiablePGObject>

Now that you have seen scalable VM backup in action, try it out yourself. Installation instructions for this DPM update are provided in KB 2966014.