How We Solved Problems With Exchange Running on a Domain Controller in EBS

There have been quite a few discussions, blogs, and articles on some of the challenges with deploying and running Exchange on a domain controller. In 2006, Robert Moir wrote a detailed blog titled “Running Exchange on a Domain Controller.” In his blog, he addressed, at a high level, some of the challenges with performance and security when running Exchange and Active Directory on a domain controller. Note that this article is not simply about deploying Exchange Server 2007 in Active Directory (not like what Marc Grote talks about in his article on “how Exchange Server 2007 extends the Active Directory Schema”). The distinction we are making here is that we are running Exchange on the domain controller.

The Small Business Server (SBS) was the first Microsoft product to officially deploy and support Exchange and Active Directory on a domain controller; however, due to the single domain controller nature of the product, it did not face some of the additional challenges described in this article. During the development of Essential Business Server (EBS), which deploys Exchange 2007 on a domain controller in a multiple domain controller environment, we faced a new set of challenges and developed resolutions for them. These resolutions can also be applied to any enterprise-level deployments with Exchange 2007 on a domain controller. In a mid-size and enterprise domain, it is strongly recommended to keep at least one replica domain controller around for fault tolerance and load balancing. However, due to the replication latency between domain controllers, deployment and maintenance of Exchange on a replica Domain Controller becomes more challenging. Moreover, Exchange disaster recovery becomes significantly different than recovery from a member server.

One of the main challenges we faced in running Exchange 2007 on a Domain Controller was its recovery after a disaster. The recovery of Exchange when installed on a replica Domain Controller has a special formula that is quite different from any other topology. Exchange 2007 supports the /mode:RecoverServer that will automatically deploy the server binaries and recover all server settings from Active Directory, however, the Exchange computer object must be preserved and untouched in Active Directory for Exchange recovery to succeed. On the contrary, Active Directory does not recommend “taking over an existing slot” and re-using the existing computer object when promoting a failed Domain Controller. The recommended approach in recovering a failed (or unsuccessfully demoted) Domain Controller is to run ntdsutils.exe and cleanup the leftover computer object, the related NTDS settings object, the replication channels, and all other traces of the failed Domain Controller from Active Directory. Clearly, this approach will render the Exchange server unrecoverable.

In EBS, we solve this problem by examining the state of the Active Directory after disaster, transferring and (if needed) seizing the FSMO roles, and only cleaning up the NTDS setting object of the server. This object can be found under the Sites container in the configuration partition of Active Directory. We keep the computer object untouched and join the new server to the domain with the same name as the failed server. Then we promote the new server to a Domain Controller while preserving all settings and permissions on the computer object and then attempt a recovery of Exchange. With this approach, the domain controller and Exchange server recover successfully. EBS providers this solution under the hood through its recovery process and the users need not to worry about the details.

Another challenge we faced in running Exchange on a Domain Controller was the service dependency between Active Directory and Exchange 2007 services. This dependency can cause some of the core Exchange services (such as the transport service and information Store) to not start properly after a reboot. The reason for this is that during a reboot, all of Active Directory and Exchange 2007 services on that server will be restarted. Upon restart, some of the Active Directory services such as Kerberos Domain Controller take longer to reach the running state if the domain controller is deployed in a domain with other replicas. This is probably associated to the fact that the booting Kerberos Domain Controller must complete a full replication cycle before serving clients and issuing tokens. In the meantime, Exchange Active Directory Topology services will attempt to contact the closest Kerberos Domain Controller, in this case the local host's Kerberos Domain Controller service which results in a hard-to-detect race condition. In a multi domain controller domain, it is very likely that the local Kerberos Domain Controller isn’t responsive in time for the Exchange Active Directory Topology service and unfortunately this service fails to start and gives up. Many key Exchange 2007 services such as transport and Information store depend on the Exchange Active Directory topology service and they all fail to start. Note that this problem is only visible when running Exchange on a domain controller and it is emphasized when running Exchange on a replica domain controller. There are a few ways to solve this race condition. In EBS, we chose to change all Exchange service start types from Automatic to Automatic delay start. This change allows Kerberos Domain Controller service plenty of time to come up before Exchange Active Directory Topology service and other subsequent Exchange services start querying it.

 Another problem with installing Exchange on a domain controller surfaces during the installation of Exchange 2007. During installation, Exchange 2007 will first attempt to prepare the Active Directory and extend the schema. For this it will target the schema master domain controller. The schema master is one of the FSMO roles assigned to only one domain controller in the entire forest, which may even be in a remote location. After the schema has been extended, Exchange will proceed with the remainder of the installation, however, there is no guarantee that the remainder of Active Directory changes will also be targeted to the same domain controller, especially when Exchange installation is running on a local host domain controller. Due to delays in replication and depending on the proximity of the FSMO role owners to the local host domain controller, the Exchange 2007 installation can hiccup and fail with “object not found” and “no permission” errors. This is because the changes made during the previous stages of installation haven’t yet made it to the local domain controller. Note that Exchange 2007 installation is very resilient and retry-able, so almost any failure can be retried until it succeeds, however in a multi domain controller domain with scattered FSMO roles, Exchange 2007 installation on a domain controller can leave the administrator baffled with random race conditions, before it succeeds. To eliminate this problem, we (temporarily) gather all FSMO roles on the local domain controller and target the Exchange 2007 installation to the local domain controller. More specifically, after promoting the local domain controller, we ensure a full cycle of replication has completed, and the domain controller is properly advertising and responding, then we transfer the roles over to the local domain controller and ensure all the other domain controllers (at least the ones in local Active Directory site) are aware of the FSMO role ownership change, and then we install Exchange 2007 with the /DC:”local DC name.” At this point, we are completely confident that all Exchange changes will target the local domain controller and Exchange setup will complete faster (eliminating network traffic) and without any random failures.

Even though installing domain controller and Exchange 2007 might be resource intensive on the server, it somewhat helps reduce some network traffic as long as you make the domain controller a global catalog. This is because Exchange 2007 produces plenty of traffic between itself and the closest global catalog in the domain and this network traffic is eliminated if Exchange is installed on a Global catalog. Therefore, it is important to make the local domain controller a GC if you plan on installing Exchange on it.

 

Alireza Farhangi