After Booting into DSRM DC takes a long time to become available

I had an interesting issue recently and I thought it would be useful to share the information. I was working on a test environment was going through some DR scenarios and noticed that the DC’s were taking a long time to boot into normal mode after booting into DSRM. This was true even if I did nothing in DSRM other than logon and restart.

 

DC’s were Windows 2003 SP2 x64

Originally 2 DC’s in the environment, but the test was more pronounced with more DC’s (3 onwards, i.e. more replication partners)

IPSec was utilized between DC’s (configured via group Policy; using Certificates, but the same issue was seen using Kerberos)

 

After rebooting from DSRM the Domain Controller takes an unusual length of time (6-15 minutes from “Applying Network Settings” to the actual logon prompt; dependent on the number of DC’s in the environment) to enable logon at the console. “Applying Network Settings” takes approximately 5-6 minutes and when the logon dialog box appears, this usually does not display the logon domain for about another 5-10 minutes, by which time a user can logon. When carrying out the same procedure on x86 DC’s the timings were considerably reduced and I haven't had time for further investigation.

 

Testing, Log Analysis and Research found this is an expected behaviour. This is the High Level theory of what is occurring.

  • IPSec is implemented using a Domain Based IPSec Policy using Certificates and this scenario plays the same if Kerberos authentication is used.
  • When a DC is booted into DSRM mode the Group Policy is cleared with the exception of IPSec Policy.
  • When restarting the DC into normal mode, the server waits for Initial synchronization. In this case, the initial synchronization will fail as IPSec initialises but the configuration is not applied fully, so the DC cannot have comms with partner DC’s.
  • As the Certificate details are stored in active directory, the Server can’t read it till the initial synchronization process is completed and the Directory Services have fully initialised locally (same case for Kerberos).
  • Also DNS cannot load up the Directory integrated zones due to lack of Directory Services availability, and therefore if the DC uses itself for DNS search this will fail.
  • Once the initial sync process completes (timesout), the domain controller can read the information from the local active directory, load up the DNS zone and get the certificate information.
  • IPsec can complete initialisation and the DC applies policy etc and life becomes good again.

During testing I tried a number of things to resolve the issue, 3 of which worked (listed below), but all except item 2 were not satisfactory fixes for most production environments if you seriously need to IPSec between DC's.

  1. Disable the IPSec Policy.
  2. Use Pre-shared keys in the IPSec policy settings rather than Certificates or Kerberos.
  3. Set the registry value "Repl Perform Initial Synchronizations" to zero to prevent Initial Synchronisation https://support.microsoft.com/kb/305476. This obviously is not advised in a production environment, but did prevent the issue from occurring during testing.