Exchange 2007 Sp1 Cluster Install/Recovery Issues

One of the problems that I’ve seen customers encounter several times now when creating a new Clustered Mailbox Server (CMS), or attempting to recover an existing CMS is a failure during setup when the Network name resource is brought online.  The specific error that is seen is as follows:

The computer account '<Exchange Virtual Server Name>' was created on the domain controller \\<pdc emulator name>, but has not replicated to the desired domain controller (<local DC name>) after waiting approximately 60 seconds. Please wait for the account to replicate and re-run setup /newcms.

Most (if not all) of you probably already know, but the Cluster portion of setup is completely separate from installing the Mailbox role.  The Mailbox Role can be installed as part of the same instance of setup, but it will complete prior to the Clustered Mailbox Server portion of setup.  The issue encountered above appears to be isolated to Exchange 2007 clusters that are running on Windows 2003 only.

The issue stems from how the Windows 2003 Cluster service works.  When Exchange setup runs, it makes a call into the Cluster API to create the computer account for the Exchange CMS.  It is important to note here that Exchange is not what is creating the computer account, rather the Cluster service is.  At this point, what happens is that the Cluster service attempts to contact the PDC Emulator for the domain, and creates the computer account on that Domain Controller.  If the PDC Emulator FSMO role happens to be in a different Active Directory Site than the Active Directory Site where you are installing the Exchange cluster, then there may be a delay in getting that computer account replicated to a local Domain Controller.  This issue is also observed because Exchange setup will now use a Domain Controller in the local AD site rather than also automatically going to the PDC Emulator.  This change was first made in Sp1 for Exchange 2007.  So given the above, let’s consider the following chain of events.

1. Exchange setup calls in to Cluster API to create the computer account for the CMS.
2. Cluster API contacts the DC holding the PDC Emulator role, and creates the computer account.
3. Exchange setup contacts a DC in the local AD site and checks for the existence of the CMS computer account.

At this point, if either the computer account doesn’t exist, or if the computer account exists but is disabled on the local DC, then setup will fail with the above error.  Re-enabling the computer account on the local DC will not fix this issue, as the computer account was created or reset on the PDC Emulator, and the account on the local DC no longer matches what is on the PDC Emulator.

Further complicating the issue is that when you re-run setup /newcms, the entire procedure appears to be repeated.  If you watch in Cluster Administrator, you will see that the group that was created for the CMS, along with all resources inside the group, is deleted.  Now at this point, the only resources that exist are the IP Address resource, and the Network Name resource.  The CMS group will then be re-created, and the IP address resource, and Network Name resource will also be re-created.  Re-creating the Network Name resource causes the computer account to be reset on the PDC Emulator, and causes it to be disabled on any other DC’s (that is my understanding here at least).  So you’re right back to square one.

How do we get around this?  Pre-staging the computer account on the PDC Emulator, and allowing that to replicate to the local DC does not work, because the pre-staged account will be deleted and re-created by the Cluster service during setup.  I have thus found the following 2 workarounds that should allow setup to continue past this section.

1. Move the PDC Emulator role to a Domain Controller in the local AD Site.

Pros: Relatively easy to do, should not cause any additional issues with Active Directory.
Cons: Requires a functional design change to Active Directory Infrastructure.  May require approval of multiple teams.

2. Block the Cluster service from communicating with the PDC Emulator.

Pros: Works except in rare situations where the PDC Emulator is in a different AD site, but the same IP subnet.  Fairly easy to implement.
Cons: not “officially” supported/tested by the Exchange Product Group.

Let’s talk a little more about workaround 2.  What does this involve?  Actually, just a few minor changes.  You need to modify the local Hosts file on the Exchange Server (located in C:\Windows\System32\Drivers\Etc – look for the file Hosts with no extension), and add the following entries to the bottom of the file.

127.0.0.1    PDCEmulator.FQDN.com
127.0.0.1    PDCEmulatorNetBIOS

Save the Hosts file, the run ipconfig /flushdns, and nbtstat –R

Run setup again, and the operation should be successful.  What’s the difference you ask?  Well, if you prevent the Cluster service from communicating with the PDC Emulator, then it will fall back to using a local DC.  When it creates the computer account on a local DC, intra-site replication is almost instantaneous, so Exchange is able to find the correct computer account on the DC that setup chooses, and is able to go on past this portion of setup.

As mentioned above, this issue can be encountered when setting up a new cluster (setup /newcms), or if you are recovering an existing cluster (setup /recoverCMS).  The second scenario would be especially common if you are testing your DR procedures, and performing a failover/activation to an SCR Target.

This issue should not be present in Windows 2008, as the cluster service is smarter, and when the PDC Emulator is detected as being in a different site, the cluster service should automatically use a local DC for operations such as this.