Disk Image Backups and Multi-Master Databases (or: how to avoid early retirement)

Hi folks, Ned here again. We published a KB a while back around the dangers of using virtualized snapshots with DFSR:

Distributed File System Replication (DFSR) no longer replicates files after restoring a virtualized server’s snapshot

Customers have asked me some follow up questions I address today. Not because the KB is missing info (it’s flawless, I wrote it ;-P) but because they were now nervous about their DCs and backups. With good reason, it turns out.

Today I discuss the risks of restoring an entire disk image of a multi-master server. In practical Windows OS terms, this refers to Domain Controllers, servers running DFSR, or servers running FRS; the latter two servers might be member servers or also DCs. All of them use databases to interchange files or objects with no single server being the only originator of data.

The Dangerous Way to Backup Multi-Master Servers

  • Backing up only a virtualized multi-master server’s VHD file from outside the running OS. For example, running Windows Server Backup or DPM on a hyper-V host machine and backing up all the guest VHD files. This includes full volume backups of the hyper-v host.
  • Backing up only a multi-master server’s disk image from outside the running OS. For example, running a SAN disk block-based backup that captures the servers disk partitions as raw data blocks, and does not run a VSS-based backup within the running server OS.

Note: It is ok to take these kinds of outside backups as long as you are also getting a backup that runs within the running multi-master guest computers. Naturally, this internal backup requirement makes the outside backup redundant though.

What happens

What’s the big deal? Haven’t you read somewhere that we recommend VSS full disk backups?

Yes and no. And no. And furthermore, no.

Starting in Windows Server 2008, we incorporated special VSS writer and Hyper-V integration components to prevent insidiously difficult-to-fix USN issues that came from restoring domain controllers as “files”. Rather than simply chop a DC off at the knees with USN Rollback protection, the AD developers had a clever idea: the integration components tell the guest OS that the server is a restored backup and resets its invocation ID.

After restore, you’ll see this Directory Services 1109 event when the DC boots up:

image

This only prevents a problem; it’s not the actual solution. Meaning that this DC immediately replicates inbound from a partner and discards all of its local differences that came from the restored “backup”. Anything created on that DC before it last replicated outbound is lost forever. Quite like these “oh crap” steps we have here for the truly desperate who are fighting snapshot USN rollbacks; much better than nothing.

Now things get crummy:

  • This VSS+Hyper-V behavior only works if you back up the running Windows Server 2008 and 2008 R2 DC guests. If backed up while turned off, the restore will activate USN rollback protection as noted in KB875495 (events 2095, 1113, 1115, 2103) and trash AD on that DC.
  • Windows Server 2008 and 2008 R2 only implement this protection as part of Hyper-V integration components so third party full disk image restores or other virtualization products have to implement it themselves. They may not, leading to USN rollback protection as noted in KB875495 (events 2095, 1113, 1115, 2103) and trash AD on that DC.
  • Windows Server 2003 DCs do not have this restore capability even as part of Hyper-V. Restoring their VHD as a file immediately invokes USN rollback protection as noted in KB875495 (events 2095, 1113, 1115, 2103), again leading to trashed AD on that DC.
  • DFSR (for SYSVOL or otherwise) does not have this restore capability in any OS version. Restoring a DFSR server’s VHD file or disk image leads to the same database destruction as noted in KB2517913 (events 2212, 2104, 2004, 2106).
  • FRS (for SYSVOL or otherwise) does not have this restore capability in any OS version. Restoring an FRS server’s VHD file or disk image does not stop FRS replication for new files. However, all subfolders under the FRS-replicated folder (such as SYSVOL) – along with their file and folder contents – disappear from the server. This deletion will not replicate outbound, but if you add a new DC and use this restored server as a source DC, the new DC will have inconsistent data. There is no indication of the issue in the event logs. Files created in those subfolders on working servers will not replicate to this server, nor will their parent folders. To repair the issue, perform a “D2 burflag” operation on the restored server for all FRS replicas, as described in KB290762.

Multi-master databases are some of the most complex software in the world and one-size-fits all backup and restore solutions are not appropriate for them.

The Safe Way to Backup Multi-Master Servers

When dealing with any Windows server that hosts a multi-master database, the safest method is taking a full/incremental (and specifically including System State) backup using VSS within the running operating system itself. System state backs up all aspects of a DC (including SYSVOL DFSR and FRS), but does not include custom DFSR or FRS, which is why we recommend full/incremental backups for all the volumes. This goes for virtualized guests or physical servers. Avoid relying solely on techniques that involve backing up the entire server as a single virtualized guest VHD file or backing up the raw disk image of that server. As I’ve shown above, this makes the backups easier, but you are making the restore much harder.

And when it gets to game time, the restore is what keeps you employed: your boss doesn’t care how easy you made your life with backups that don’t work.

Final thoughts

Beware any vendor that claims they can do zero-impact server restores like those that I mentioned in the “Dangerous” section and make them prove that they can restore a single domain controller in a two-DC domain without any issues and where you created new users and group policies after the backup. Don’t take the word of some salesman: make them demonstrate my scenario above. You don’t want to build your backup plans around something that doesn’t work as advertised.

Our fearless writers are banging away on TechNet as I write this to ensure we’re not giving out any misleading info around virtualized server backups and restores. If you find any articles that look scary, please feel free to send us an email and I’ll see to the edits.

Until next time.

– Ned “one of these servers is not like the other” Pyle