Back in 2001, I worked for a 300-employee company in Palo Alto as their Systems Engineer and one of the things that gave me the most headaches was backups. We had Windows Server across the board, but the backup application was from a third party, which required specific agents for SQL Server, Oracle, Exchange, etc. Some applications were actually easy to backup while running (like SQL Server) but others were a real pain with open files, inconsistent states, etc. We had a joke about how easy it was to backup, if you didn’t really have to restore anything later.
In a few cases, the only way I could get a consistent backup was stopping all services during the entire backup process. One of those applications was a document management system where the files were stored in a shared folder but the metadata was in a database. For that one, we did a backup to disk and then to tape, just so we could run the critical downtime part faster and perform the tape part after service was restored. The process was very complex and involved backup agents, pre-backup scripts and post-backup scripts. I always had to check the logs in the morning (especially on Mondays) to see if everything went well and we would actually cheer when it did.
Looking back at it, that was no surprise. Backing up an application like that document management system required coordination of components from a lot of different vendors, including the OS, the backup software, the database, the document management system and the SAN. No wonder we had such a hard time making it all work.
In the last few years, however, a lot has changed. Microsoft has included a few technologies in Windows Server that facilitate the conversation between these different components to allow them to work better together. The main component behind all this is the Volume Shadow Copy Service (VSS) introduced in Windows Server 2003. The idea is actually quite simple: create a Windows service that is able to coordinate the actions required to create a consistent shadow copy (also known as a snapshot or a point-in-time copy) of the data you want to backup. You can then use those shadow copies as your backup or you can take them to another disk or to tape as required, without affecting the running application at that point.
Needless to say, that required a lot of buy in from all hardware and software vendors involved, but Microsoft actually has a tradition of creating platforms and providing the required APIs for third parties to build upon. The end result was actually pretty compelling. If you operating system, applications, backup software and SAN manufacturer all support VSS, you can create flexible storage solutions that can easily be protected, without the need to stop servicing clients.
From an end-user perspective, though, it took quite a while to get there. Back in 2003 when this was first introduced, it was very common to have pieces of the puzzle missing. Your backup software had support for VSS but you used Exchange Server 2000, which didn’t. Or you had Exchange 2003, which supports VSS, but your SAN manufacturer did not support VSS yet. You get the idea. These days, everyone is onboard with VSS, since the benefits are just too good to ignore. There was also a lot of learning involved in this, since users did not really understand the details of VSS and exactly why a specific combination of hardware and software would not work.
There are four basic parts of a VSS solution that need to be in place for a complete system to work: the VSS coordination service, the VSS requester, the VSS writer and the VSS provider. In order to understand how they work together, you need to distinguish each component’s role.
- The VSS coordination service is part of the operating system and was first shipped with Windows Server 2003. The coordinator role, as the name implies, is to make sure the other components can communicate with each other properly and work together.
- The VSS requester is the software that commands the actual creation of shadow copies (or other high-level operations like importing, breaking or deleting them). Typically, this is the backup application itself. The backup tool in Windows Server is a VSS requester and so is the System Center Data Protection Manager application. Third party VSS requesters include virtually all backup software that runs on Windows Server 2003.
- The VSS writer is the component that guarantees that we have a consistent data set to backup. This is typically provided as part of your application software, like SQL Server or Exchange Server. A VSS writer for the basic file system is included with Windows Server. Third-party VSS writers are included with many applications for Windows Server that need to guarantee data consistency at the time of a backup, like Oracle.
- The VSS provider is the component that takes care of keeping the shadow copies after that consistent point in time. This could be done in software or in hardware. Windows Server includes a VSS provider that uses copy-on-write. If you use a SAN, you probably want to make sure you install their VSS provider so you can have a more efficient way to split your shadow copies without putting any extra burden in the operating system itself. SANs are really good at that.
Now that you understand each component’s role in the process, you can probably guess how to create a backup (or shadow copy) using the VSS technology. Here’s my oversimplified explanation of how it happens, using a database backup as an example:
- Using the backup software (VSS requester), you command the start of the database backup (creation of a shadow copy).
- The database software (VSS writer) “freezes” the database, making sure that it is in a consistent state until further notice (hold your breath!).
- The SAN hardware (VSS provider) creates a snapshot of the data.
- The database software (VSS writer) is notified that the shadow copy is done and it’s OK to write to the database again.
- The backup software (VSS requester) tells you that the shadow copy was successfully created.
The critical part of this operation, when the VSS writer is told to hold all writes, can only take a few seconds. During that period of time, all IO operations are simply queued and will only be completed only after it’s all done. Because of that, creating a shadow copy does not significantly impact the performance of the production system. If the system is unable to queue the IO requests during that period or if it takes longer than 10 seconds, the shadow copy creation process will just fail and will have to be retried later.
The entire process is orchestrated by the VSS coordination service and it’s more complex than it appears. For instance, you need to enumerate all the volumes related to the database in question. The VSS writer also needs some time (before the critical 10 seconds) to commit all transactions to disk before telling the coordinator that it’s ready to go. You also need to tell the VSS provider in advance that you’re about to create a shadow copy of a volume, so it can properly prepare for it and do it quickly.
As you probably figured out by now, splitting a shadow copy of the data actually involves a little trick. You don’t have enough time to create a physical copy of an entire terabyte-sized database in under 10 seconds, even using the fastest disks around. What the VSS provider typically does is to mark all the data blocks currently in use by that volume so that it can keep a copy of the “old state” if the data needs to be overwritten after the shadow copy is completed. SAN-based system also have built-in abilities to create snapshots, which implement that behavior Some systems will actually immediately start a background copy of the data to another volume and will eventually put the volume back in a regular state, with no need for that extra tracking. The built-in VSS provider in Windows Server can keep many shadow copies of the same volume, setting aside a portion of the hard drive just for those old blocks.
As you can see, the Volume Shadow Copy Service (VSS) infrastructure in Windows Server is quite important. Many other products and technologies build on top of it to create a great backup experience in Windows. For instance, the System Center Data Protection Manager (DPM) is a Microsoft product that takes VSS to the next level by providing continuous data protection. DPM will actually act as a VSS requester and not only create shadow copies of specific sets of data, but it will actually ship those changes to another server, where shadow copies from multiple servers and applications are managed centrally. But that is an entire different subject…
For more information about Volume Shadow Copy Service (VSS), please check this white paper:
You can also find developer-focused information on VSS at: