Whoops....glad at created a checkpoint

Sorry for the gap between posts. I was on the east cost for the thanksgiving holiday and then stayed for the week to talk to some customers about our future road map and get their feedback. Anyhow, today we're gonna talk about checkpoints. 

 One of the things that I personally used to find confusing about virtual server was the usage of both differencing disks and undo disks. Both were used to do pretty much the same thing - roll back a VM to a previous state - but neither were particularly easy to use. After some deliberation with customers, we decided we'd only support one of the two in SCVMM and we decided on differencing disks since they provided a general superset of the functionality of undo disks. If you try to important a VM with differencing disks into VMM, it will prompt you to either discard the undo disk or merge it first.

The second key decision we made was with respect to granularity of creating of the differencing disks. Although virtual server allows you to create a differencing disk at the VHD level, we decided to allow creation of differencing disks only at the VM level. This simplifies much of the management and frankly, if you create a differencing disk on only a subset of the VHDs associated with a VM and then try to delete/recover a subset of the VHDs, you're taking your life into your own hands. The whole point of reverting a VM back in time is to revert the entire system state, application state and data. If you recover a subset of your drives, you have a high probability of these three components being out of sync and the VM won't run properly.

The last thing we had to figure out (and probably the most important thing) was how to make differencing disks easier to use. Managing them the the virtual server web UI isn't particularly easy and you as an administrator need to create and migrate the differencing disks on your own through the file system. To simplify this experience, we created the UI concept of a "checkpoint". A checkpoint is a logical entity that refers to a set of differencing disks associated with a VM and SCVMM manages the disks and their relationships for you. If you right click on a VM and click on "manage checkpoints", this opens up our checkpoint management dialog.

When you create a new checkpoint in our UI or use the the New -VMCheckpoint cmdlet, underneath the covers, SCVMM creates a set of differencing disks for each of the VHDs associated with the VM that you are checkpointing. If you actually go to Windows explorer and take a look, you'll see that the new disks are there and the naming convention that we use is the parent VHD name + the Date/Timestamp + VHD sequence number. SCVMM stores the fact that a new checkpoint exists in our database but it can also parse back the VHD name to figure out that the differencing disk is actually associated with a checkpoint object. (If you import VMs that already have differencing disk, SCVMM won't recoginize them as checkpoints since the metadata is missing.)

With virtual server, the VM needs to be stopped in order for checkpoints to work. If the VM is running and has VM additions installed, SCVMM will perform a shutdown for you, otherwise it is stopped directly. With the upcoming hypervisor, the VM can be checkpointed even while running which is a nice improvement. The hypervisor UI will call checkpoints "snapshots" but in order to be consistent and compatible with our first version, we decided to keep the term 'checkpoint' so we can use the same cmdlet names and the same UI.

You can also 'merge' checkpoints in our UI. We called this 'delete' in our beta since technically it removes the differencing disk chain by merging all changes into a single VHD but for obvious reasons, this caused users to think that data would actually be lost. If you merge a checkpoint, all future checkpoints must also be merged and you'll get that warning if you try this in the UI.

Restore has similar semantics to merge but it actually does delete the differencing chain thus reverting the state of a VM to the point in time represented by the checkpoint. When you restore to a checkpoint, we keep the diff disk/checkpoint for the point you restored back to. This enables training lab scenarios where you build a set of VMs, have students run through their training and then revert back to the original VM over and over again.

One thing we know from customers about restores is that typically people restore to the most recent checkpoint (not always but this is by far the most common case). Taking this into account, we add a '-MostRecent' parameter to the Get-VMCheckpoint cmdlet so you can have easy script access to the most recently created checkpoint. If you want to restore a VM to the most recent checkpoint, you'd run something like this:

Get-VMCheckpoint -MostRecent | where { $_.VM -eq "VM01" } | Restore-VMCheckpoint

Of course you'd change the name of the the VM to the one you're targeting.....if you don't specify a particular VM, this cmdlet will restore all of your VMs to the most recent checkpoint so be careful.

One of the parameters for New-VMCheckpoint that we regrettably didn't get time to include in V1 is the -RunAsynchronously flag. This means that in a script, VMCheckpoint operations run serially. As a workaround, the UI does let you select multiple VMs and create a new checkpoint which launches several checkpoint jobs in parallel. Such is the nature of shipping software though, you try to get everything in but ultimately you need to make some tough calls. Given that the hypervisor has a much improved mechanism for differencing disk creation, this feature will get even better in our next release but it's already proved to be pretty popular in V1.

Rakesh