This is the first in a series of posts discussing the new features in the Volume Shadow Copy Service (VSS) for Windows Server 2008 R2 and Windows 7. Before we dive in, it’s important to note that all existing VSS solutions for Vista and Server 2008 will work as-is on Windows 7 and Server 2008 R2, respectively.
In Server 2008 R2, we’ve enabled a hardware shadow copy scenario commonly known as LUN Resync or LUN Revert, which enables the restore of a volume from an existing shadow copy. This article provides some context around the use cases and an overview of the APIs that we’ve introduced to support the scenario.
The desire to achieve quicker recoveries, and thus reduce down time in the event of data loss, has driven the adoption of disk based recovery points. It also doesn’t hurt that disk storage is getting increasingly cheap and is often more manageable than tape, especially during recoveries. All of this means that enterprise and even SMB customers often rely on disk backups for the first tier of data protection and sometimes as the only means of backups.
LUN Resync is a fast recovery scheme that leverages storage array capabilities to restore data from a shadow copy. In a typical scenario, an application administrator might create a hardware shadow copy of a line-of-business application database every day and keep recovery points for as far back as a week.
Recovery from data loss: In the event of a data integrity issue or hardware failure, the administrator invokes a resynchronization (resync) from the shadow copy to restore the data to either the original LUN or an entirely new LUN. It’s important to note that though VSS shadow copies are at the volume granularity, in the case of a hardware shadow copy the array actually creates a shadow copy of the LUN containing the volume to be shadow-copied.
The shadow copy could be a full clone or a differential shadow copy. In either case, at the end of the resync operation, the destination LUN will have same contents as the shadow copy LUN. During the resync, the array performs a block-level copy from the shadow copy to the destination LUN.
Most arrays allow production I/O to resume shortly after the resync operation begins. While the resync operation is in progress, reads are redirected to the shadow copy LUN, and writes to the destination LUN. This allows for recovery of very large data sets and resumption of normal operations in the order of several seconds.
Creation of a test bed: LUN resync can also be used to seed a test bed with live data. In this scenario, a hardware shadow copy of live data is created and then resynchronized to a newly created LUN in a test environment. The LUN in the test bed is now available for various test operations and the resync can be repeated as necessary to refresh the data.
LUN Resync is a quick restore mechanism that blends with the other VSS APIs to offer an integrated fast recovery API.
I/O for recovery is offloaded from host: During a resync, the heavy lifting of restoring data is offloaded to the array (which performs a block level copy), freeing up the host system.
Integration with conventional restore workflow: The LUN Resync APIs in Server 2008 R2 allow a quick restore to be performed much like any other traditional VSS restore. For example, it is now possible to signal an application’s VSS writer with a pre-restore event, restore the volume and then send a post restore notification, all from the same interface. The VSS APIs also move a clustered shared disk into maintenance mode for the duration of a resync operation.
Forward compatibility: There are applications today, that use array specific APIs to perform the resync operation. However this approach relies on reverse engineering the shadow copy metadata to convert a shadow copy LUN into a normal LUN at the end of the resync, which has limitations. Moreover, any approach that doesn’t use documented VSS APIs is likely to break if significant changes are made to the implementation in a new OS release.
LUN Resync APIs and workflow
The following is a high-level view of the changes required in requesters and hardware providers to support the new scenario; you can find details on each of the APIs in the VSS documentation on MSDN. Sample code for requesters and providers is available in the samples vshadow and vsssampleprovider respectively, in the Windows SDK for Windows 7 and Server 2008 R2.
For a requester, the workflow is very similar to that in a conventional restore.
1. Initialize the IVSSBackupComponentsEx3 interface for restore using the backup components document saved during backup.
2. Select the appropriate components for restore if the operation involves writers.
3. Add pairs of shadow copy and destination volumes to the recovery set using AddSnapshotToRecoverySet. This establishes the mapping between the source shadow copy and destination volumes.
4. Invoke pre-restore if the operation involves writers. This is a cue for writers to release handles or close databases in preparation for an impending restore.
5. Invoke RecoverSet to initiate the Resync operation which leads to the ResyncLUNs call on the appropriate hardware providers. The requester is handed back an async interface to track the status of the operation.
6. Once the RecoverSet operation is complete invoke post-restore if writers are involved in the restore. This is a cue for writers to perform any fix ups like applying logs before normal operations resume.
Hardware providers that support Resync must implement the method IVssHardwareSnapshotProviderEx::ResyncLuns. VSS will map the shadow copy and destination volumes specified in the AddSnapshotToRecoverySet call into source and destination LUNs that are passed on to the provider during the ResyncLUNs call.
Note that resyncs can only be performed from transportable hardware shadow copies.
Resync vs. Swap
LUN Swap is a fast recovery scenario that VSS has supported since Windows Server 2003 SP1. In a swap, the shadow copy is first imported and then converted into a read-write volume using IVssBackupComponentsEx2::BreakSnapshotSetEx. The conversion is an irreversible operation, and the volume and underlying LUN cannot be controlled with the VSS APIs after that.
So how does a resync compare with a swap? Are there benefits of using one over the other?
Reuse of shadow copy: In a resync, the shadow copy is not altered, so it may be used several times. In a swap, the shadow copy can be used only once for a recovery. For the most safety conscious administrators, this is important. When LUN resync is used, the entire restore operation can be retried if something goes wrong the first time around.
Differentiated storage: At the end of a swap, the shadow copy LUN is used for production I/O. Hence the shadow copy LUN must use the same quality of storage as the original production LUN to ensure that performance is not impacted after the recovery operation. With a resync the shadow copy can be maintained on cheaper storage saving on the cost of maintaining recovery points.
In-place recovery: A swap doesn’t require the existence of a destination LUN. This can be an advantage when the destination LUN is unusable and needs to be recreated.
In subsequent posts, we’ll cover other new features including a new light weight API for VSS writers, some handy diskshadow scripts for LUN Resync and other scenarios and API updates. Stay tuned.