How to Virtualize Active Directory Domain Controllers (Part 1)

Article
06/10/2010

Hello Everyone, this is Shravan from the Active Directory team and Jason from the System Center VMM team here at Microsoft. We will be discussing a scenario that comes up often: how to migrate active directory domain controllers to a virtualized system.

Why Now?

Reduce Cost! Reduce Cost! Reduce Cost! It’s an old adage. When this conclusion reaches the folks who work within large data centers, this means a big push to consolidate how much space, cost and energy we consume on the big beefy servers. Virtualization serves as a good method to optimize the use of the server resources but data center administrators need be cautious as they proceed. Therefore let’s discuss some of the common concerns regarding virtualized domain controllers as to when/where/how to move the resources to virtual hardware.

How to plan?

When introducing virtualized DC’s, one needs to think of virtual DC's the same way they think about scalability planning with physical DC's with the extra dimension of virtualization platform. Conventional wisdom says not to put all eggs in one basket and avoid single point of failures as much as possible. Some of the logical examples of these single points of failure for physical DC’s are as follows:

All DC's in the same data center
All DC’s on the same network switch
All DC’s on the same power grid
All DC’s same make/model of hardware etc.

Administrators have learned to avoid these pitfalls by adequately planning the resources. Taking this to the next level, the same applies to the virtualized DC’s as well. Here are some examples of single points of failure specifically to the virtualized DC’s:

Multiple DC's on a common host virtual server
Multiple DC’s using the same hard disk spindle
Multiple DC’s using the same network adaptor on a virtualized host
Multiple DC’s hosted on different hosts but using single UPS for power failures

One of the most obvious single points of failure is that when the machine- on which all the virtualized solutions run - fails or when the virtualization solution itself fails. This event causes all Virtual Machines hosted by that machine to go offline. This might sound scary but actually, this risk is relatively easy to handle. Redundant capacity and regular backups of the virtualized operating systems (together with the virtualized applications) are a warranty against data loss and downtime due to the single point of failure.

Another question is in what order to virtualize the DC's in the Hub and Branch sites. The same considerations that went into place when placing the number of physical DC’s in each site needs to be revisited. There may be specific cases which call for specific plan. Our general recommendation would be to start with optimizing the number of the DC's needed in the branch office sites first while constantly testing the load bearing capacity in each step. Then virtualize the DCs in the Hub site. Performing the steps in this bottom-up fashion ensures you don’t starve the branches sites while virtualizing your hub DCs. As always, nothing beats comprehensive testing in your own environment as one size may not fit all.

Pardon the geek-speak while we review some performance considerations: The peak and steady state load generated by a collection of VM guests should not exceed the capabilities of the virtual host computer and network infrastructure. Specifically, collection of VM guests should not exceed the capabilities of the CPU, disk subsystem, memory, and network bandwidth on a common host computer. Some load scenarios can exceed capabilities that a DC on a single physical computer can service so multiple physical or virtual computers may be required. So for instance, we have one virtual server hosting individual virtual machines in the following roles:

Domain Controller (DC)
Exchange server front-end server
Exchange back-end server
SQL server

The peak load on the DC as a guest is not merely dependent on the authentication traffic coming to the DC but a cumulative load on the Virtual server can also affect the capacity on the DC. Therefore, please take into account the factor the total load on the virtual server.

While we have not seen any specific issues with any roles (FSMO, GC, DNS, RODC etc) running on virtual servers. Please take load and criticality into consideration before you make the switch to virtual or deciding to keep them as physical servers.

Regardless of the virtual host software product that you are using, here are some rules on the “don’t do this when hosting virtualized DC guests on VM hosts.” These rules include but are not limited to the following:

Do not stop or pause domain controllers.
Do not restore snapshots of domain controller role computers. This action causes an update sequence number (USN) rollback that can result in permanent inconsistencies between domain controller databases. USN rollback is discussed further in this blog.
Do not perform ONLINE physical-to-virtual (P2V) conversions. All P2V conversions for domain controller role computers should be done in OFFLINE mode. System Center Virtual Machine Manager enforces this for Hyper-V. Please read further to understand the difference between ONLINE and OFFLINE modes for P2V. For information about other virtualization software, see the vendor documentation. The exception to this is tools such as disk2vhd which convert the DC while the source stays online because the virtual DC is not turned on the production network.
Configure virtualized domain controllers to synchronize with a time source in accordance with the recommendations for your hosting software. For Microsoft Virtual Server or Hyper-V server, turn off host time synchronization from the properties of the VM.
If you do not have uninterruptable power supplies (UPS) for your VM hosts or the storage disk where the active directory database resides, then ensure write-caching is disabled on the virtual machine’s host computer. Please refer this link for additional guidance. Conversely, if the write caching needs to stay enabled for the VM host which hosts the DC, then install a UPS to avoid damage to the DC(s).
Virtual DC’s are subject to the same backup requirements as physical DCs. Please refer this TechNet article for details.
Be careful when you are adding the Virtual Server host as a member of the same domain as the guest DCs it’s hosting as you may run into a Chicken & Egg problem if a DC is not available during boot time for the host.

For more considerations about running domain controllers in virtual machines, see Microsoft Knowledge Base article 888794. Also, see the following TechNet article for additional information:

Deployment Considerations for Virtualized Domain Controllers
https://technet.microsoft.com/en-us/library/dd348449(WS.10).aspx

Two methods to DC virtualization

With all that behind us let’s dig deeper into the two methods on how to introduce virtualized domain controllers into an environment.

1. DCPromo

Stand up a member server in the virtual environment and run dcpromo. Configure it as an additional domain controller to replicate the data from another DC in the same domain. If you want to reuse the same name as one of the physical DC’s, you must first demote the physical DC. Then rename the virtual server while still as a member server and then promote it as a physical server. If you choose to use the same name as an existing DC, ensure that you allow end-to-end AD replication of the demotion to complete prior to running dcpromo on the virtualized guest.

2. Physical-to-Virtual (P2V)

As per the VMM 2008 glossary, “physical-to-virtual machine (P2V) conversion [describes] the process of creating a virtual machine by copying the configuration of a functioning physical computer. ”. In simple terms, here we convert a physical domain controller server to a virtual domain controller guest using a P2V tool.

Today SCVMM (System Center Virtual Machine Manager) is available from Microsoft, as are similar 3^rd party P2V tools where you run the tool against a physical server to convert to a virtual server. In concept it performs a backup on the physical server and restores the machine to virtual hardware. The end result is you have converted the physical server to a virtual domain controller which looks and act as the original. You then turn off the converted physical DC and then connect the virtual DC to the network and clients don't see any difference in the functionality with authentication.

Since most of us are familiar with dcpromo promote/demote process, we will focus on the second method of the P2V tool. If the P2V conversion goes as expected and there are no problems after the conversion, there is no service outage other than the duration where the P2V tool is performing the backup/restore. A USN rollback will occur if for some reason you decide to move back to the physical DC after you have already performed the P2V process, and the new virtualized DC has replicated with other DCs. So don’t ever do it.

What’s USN ROLLBACK?

Back to the geek-speak: Active Directory Domain Services (AD DS) uses update sequence numbers (USNs) to keep track of replication of data between domain controllers. Each time that a change is made to data in the directory, the USN is incremented to indicate that a change has been made. For each directory partition that a destination domain controller stores, USNs are used to track the latest originating update that a domain controller has received from each source replication partner. Also, it helps with the status of every other domain controller that stores a replica of the directory partition. When a domain controller is restored after a failure, it queries its replication partners for changes with USNs that are greater than the USN of the last change it has recorded. USN rollback occurs when the normal updates of the USNs are circumvented and a domain controller tries to use a USN that is lower than its latest update.

If you are still wondering why are we talking about USN Rollback with our P2V tool, remember how we discussed that it’s performing a backup of the physical DC and restoring it to the virtual DC. If the virtual DC replicated with the rest of the DC’s and we try to reinstate the physical DC and bring it online, it will detect that the highest USN it has for itself is lower than what others have for it. When this happens, the physical DC detects that it’s in a USN ROLLBACK state, stops replication, and pauses the Netlogon service on machine startup. A USN rollback can also occur on the virtual DC if the physical DC isn't turned off immediately after the P2V finishes taking its backup of the original.

Please refer the following TechNet link for a detailed understanding of USN Rollback - https://technet.microsoft.com/en-us/library/dd348479(WS.10).aspx

NOTE: In Windows Server 2003 (SP1) and later, USN rollback will be detected and replication will be stopped before divergence in the forest is created, in most cases. For Windows 2000 Server, the updates in Microsoft Knowledge Base article 885875 must be installed to enable this detection. Remember that Win2000 support ends on July 13, 2010 though, so your real answer here is to not be running it at all!

The supported recovery options when in USN Rollback state are pretty limited - you have to forcibly demote the DC, perform a metadata cleanup and re-promote the domain controller.

How to P2V Domain Controllers

During the course of writing this blog, we did a bunch of different tests and tried out different combinations of hardware, FSMO roles, GC, domains etc. We will be sharing our takeaways during this experiment. For those who are unfamiliar with SCVMM as a product and how P2V works, the detailed steps regarding the SCVMM P2V process are thoroughly documented in the following links:

P2V: How to Perform a Conversion
https://technet.microsoft.com/en-us/library/cc917882.aspx

P2V: Converting Physical Computers to Virtual Machines in VMM
https://technet.microsoft.com/en-us/library/cc764232.aspx

One of our customers shared the following link with us which outlines VMWARE’s P2V method which uses online migration. https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006996

Please note ONLINE mode keeps the source and target running at the time and is not recommended. When using this un-recommended method, it’s up to the administrator to keep the network cable disconnected on the respective machines to keep them isolated. A lot of our customers experience that keeping the new target virtual DC completely isolated from the source physical DC is easier said than done. There is a big risk of USN rollback if the machines are not isolated as identified by VMWARE. We have seen a number of customers who try to perform an ONLINE P2V and end up in a USN Rollback state, leading to the forced demotion of the problem DCs.

Good place to mention our disclaimer for any 3^rd party software for virtualization.

897615 Support policy for Microsoft software running in non-Microsoft hardware virtualization software
https://support.microsoft.com/default.aspx?scid=kb;EN-US;897615

By now, you should be able to able to identify some of the benefits and pitfalls of going virtual on your domain controllers. Next time we will go into the details on how to perform the Offline P2V migration of domain controllers using SC VMM, requirements on the source machines, destination servers, identifying the suitable candidates that can be moved over to the virtual world.