Thank you, Chris, for this excellent write-up on backing up virtual environments!
One of the most important things to remember when talking about backup best practices in virtual environments is that virtual environments are not physical environments. I know that sounds really silly but that it is really quite important because physical environments have a different architecture than virtual environments. When we talk about running one operating system and one underlying hardware set, it’s important to understand that one hardware, one disk, one operating system, relationship demands a specific architectural design for the software that would be used to backup that physical architecture. In that physical environment the software designers used an architecture that focused heavily on the use of agents to provide the interactions between backup software and the physical hardware they were trying to back up. This agent based approach was incredibly successful for a very long time. Decades! The agent based approach is still successful in physical environments today, and probably represents the best possible backup solution for the physical environment. The problem is virtual environments are not physical environments, and the world of IT is headed for the virtual environment. Virtual environments differ from physical environments because the hypervisor, whether that’s VMware or Hyper-V, provides a layer of abstraction between the underlying hardware and overlying operating systems that will actually run above the hypervisor in the virtual architecture. The important consequence that goes right along with this architectural change in the virtual world means that if you try and use the agent based approach of the past in conjunction with a virtual environment it just doesn’t work. Now the reason it doesn’t work is not that you couldn’t force the old agent based model into the virtual environment where you added an agent to every virtual machine and then monitored, managed, administrated, and maintained those agents. The challenge here is that the virtual environment would demand a dramatic additional measure of work in order to get the backup operations to work properly, and frankly it is not necessary. VMware and Microsoft, the two major players in the hypervisors space, with ESX and Microsoft Hyper-V respectively, have each made a recommendation that we do not use agents in the virtual machines! Instead the recommendation is that you use an open set of APIs and connect to those APIs using standards that will allow us to interact with a virtual machine. This technique allows the software to interact with the underlying host for that Hyper-V or ESX VM. The host provides the tracking mechanism for us to do data protection or data protection mechanisms. Agentless data protection is a big deal.
The Agentless Backup Approach.
When we think about Hyper-V we want to make certain that we take an agentless approach to backup, replication, restoration, monitoring, and management so that we maximize the capabilities that have been built into the hypervisor by Microsoft as well as minimizing the impact in resources that data protection will have on the actual virtual machines themselves. The Microsoft VSS process allows for the imaging of virtual machines in their entirety along with the associated binary, configuration, xml, snapshot, settings, and any other associated virtual machine files which would allow you to make a very complete copy of a virtual machine and its data for backup or other data protection uses. The cool thing is that this is all without the use of any installed agent inside the virtual machine. Of course all of this relies on the fact that you are using the standards based approach, where you have built a set of tools that work directly in conjunction with VSS, and with the way that Hyper-V is built.
When we think about Agentless Systems we don’t necessarily mean that we will not use any agents anywhere in the architecture. Instead what we’re talking about is the fact that the agents will not be installed in the virtual machines. In most cases the actual software that is going to provide data protection to a virtual environment running Hyper-V will have some kind of interactive component that is actually installed or configured on the Hyper-V host. These “agents” and I use the term loosely run in conjunction with the windows operating system that is actually supporting that Hyper-V host. Generally these “agents” come in the form of drivers and or services. They are really not agents in the traditional sense. The key here is that when we make the installation of components that those installed components are not going to the virtual machines, meaning there is no additional overhead to the running virtual machine, or to its application based workflow, or services, and you are not providing any additional requirement for the usage of administrative time and resources necessary to update and manage those agents.
The VSS process
Microsoft has this really cool process called the volume shadow copy service and it is the base for agentless backup of VM’s in Hyper-V. The Volume Shadow Copy service is not new, in fact, it has been around since 2003. Microsoft introduced the volume shadow copy service with Windows Server 2003 and initially it was designed to provide just what its title suggests, shadow copies or previous version copies of existing documents inside the Windows Server operating system. Today we rely on that same functionality and in fact the same VSS.exe service that was used for volume shadow copies to make image copies of virtual machines in Hyper-V. It’s important that you have a brief understanding of the volume shadow copy service so let’s talk about it now.
The volume shadow copy service is made up of three essential components first the Vss.exe service, second the VSS Requestor, and finally the VSS writer.
The VSS.exe service is responsible for taking requests from a VSS Requestor and fulfilling those requests. In this case the requests will be associated with virtual machines and image copies of those VM’s. VSS is installed with each version of Windows Server.
The VSS Requestor will formulate requests to the VSS service for a specific image to be created of a specific virtual machine. The VSS Requestor is not written by Microsoft; instead it’s a piece of software that is written by a third party in order to formulate a request that would then be passed to the VSS Service. You can make your own VSS Requestor with a little help from Microsoft who provides code samples and guidance for those interested in writing a VSS Requestor.
The VSS writer is responsible for taking the image copy of the data that is requested. The VSS writer does the actual writing of that data to disk. Depending on exactly what is requested there are a number of different VSS writers that might be used. For example if you wanted to make an image of a virtual machine running on Hyper-V the volume shadow copy service would use the Hyper-V VSS writer in order to write the image of the virtual machine that was requested by the requester.
For more information on the VSS process please see the following link to Technet.Microsoft.com. http://technet.microsoft.com/en-us/library/cc785914(v=WS.10).aspx
Agentless backup is cool, VSS process is cool, and new ways to implement the 3-2-1 rule are cool, none of this really makes any difference if we can’t get that data back quickly. The defining point in any disaster recovery plan is the ability to recover the data. When we think about recovering data, not only is it important that we understand where the data is located, it’s also important that we know and can clearly work with the format in which the data is stored, and be able to extend the new capabilities to enable advanced data recovery options at a moment’s notice. Virtual machines are built to run application workloads and those application workloads support lots of individual users. A virtual machine running Microsoft Exchange is providing e-mail services to the users in an organization. Those users do not want downtime of the virtual machine that supports their email. In the event of data loss (small or large scale) as administrators we need to find a way to recover e-mail items direct from the backup into the running virtual machine that is supporting the Microsoft Exchange email application. The data protection market has changed dramatically over the past two years with companies focusing more and more on application specific tools and less and less on the legacy methods of data restoration.
With innovative tools like Veeam’s Explorer for Exchange an organization might receive a request from a user who needs to recover an erroneously deleted email message with an associated attachment. The tool allows for the mounting of the Exchange.edb database from within the backup file. Once mounted the helpdesk professional can then search for the desired email, or simply select the user’s mailbox and browse to the email. At this point the email can be restored to the running Exchange VM, emailed directly back to the user, saved as an .msg file, or a .pst file. All of this is done in seconds while the user is on the phone, and while the Exchange server is still running and providing the desired services to the rest of the network.
This new paradigm of agentless data protection at the application level is changing the way we think about data protection and disaster recovery in virtual environments. Best of all its free!
Get the Veeam Backup Free Edition tools at http://www.veeam.com.