Disaster Recovery Site and Active Directory (Part 1 of 3)


Hi All,

Michael here again, and this time I would like to talk a little bit about Active Directory replication and Disaster Recovery sites.

Since the not so recent events (taking place on 9/11) many companies started to invest time and money in designing and implementing Disaster Recovery solutions which are located at a different physical location in order to provide the organization the option to seamlessly (saying seamlessly after a disaster strikes is kind of bad choice of words, but hey… don't shoot the messenger) failover to that site and keep the business and the organization working.

One of the aspects which needs to be considered is the oh so important but often overlooked feature of client logon. In the first part of this post we will discuss about how clients logging on to the network are affected by this.

In the second part of the post we'll review how to fail-over clients when we have multiple sites with Domain Controllers, and we want the client to failover to the best site possible, and in the third part we'll see the affect it may have on Domain Controller replication in the organization, and how to properly configure and test the failover scenarios.

 

Client logon scenario:

In our sample organization we have a hub site and a DR site both of which contain Domain Controllers. In addition we have several branch sites which don't contain any DCs and rely on the Automatic Site Coverage feature to provide the closest DC for authentication to clients. (More information on the Automatic Site Coverage can be found here - http://technet.microsoft.com/en-us/library/cc978016.aspx)

image

In this scenario the required behavior (and please note I say the required behavior and not expected behavior – that will be explained later) is for clients to authenticate to the Domain Controllers in the DR site in case the Domain Controllers in the Hub site have failed (or the site link from a specific Branch to the Hub site has failed).

 

So, we have clients located at a branch site relying on the Automatic Site Coverage feature of the Domain Controllers in order to find the closest DC.

In reality this would look similar to this:

image

Child-DC01 (located in the HUB site) is performing Automatic Site Coverage for Branch and Branch2 sites) while Child-DC02 which is located in the DR does not.

(Note: More information on the Automatic Site Coverage may be found here - http://technet.microsoft.com/en-us/library/cc978016.aspx)

Now comes the interesting part…

What happens when Child-DC01 fails?

So based on this http://support.microsoft.com/kb/314861 (and this http://technet.microsoft.com/en-us/library/cc759550(WS.10).aspx) the client would then fall-back to the generic list of all Domain Controllers in the domain:

image

So in our "simple" scenario (And I say simple cause there's more it in a second סמיילי) the client would failover to that list and will successfully find a Domain Controller in the domain. Now since the only option left available is the DC in the DR site - Child-DC02 we're good!

Now, you remember (well, it's just in the line above, if you don't then you have more serious things to worry about than this post… ) me saying there's more??

So here's more:

How do I make a domain controller failover to that DR site if I have a 3rd DC in another site?

So in this scenario we would have a Branch site, just as with the previous example – getting Auto site covered by the DC in the HUB site, but we also have a second Branch site, called Branch2 which does contain a DC (child-dc03) and the DRP site which contains child-dc02:

image

So in this scenario we need to consider costs. So considering BASL (Bridge All Site Links) is enabled – meaning all site links are transitive the following is the list of costs from the perspective of the branch site:

Branch –> Hub = 100

Branch –>  DRP = 110  This is because the site links are transitive and we combine the cost of Branch-> Hub and Hub-> DRP(100 + 10).

Branch –> Branch2 = 200 (Branch-> Hub and Hub-> Branch2 = 100+100).

So obviously we would prefer to go where it's cheaper (which eventually translate to WAN link bandwidth, latency and other decisions affecting cost selection).

But, based on what we have experienced previously the client would get all DCs from the DNS query, including child-dc03 to which we don't want the client to go.

image

Looking at a netmon trace the DNS result would be similar to:

image

This is the default netlogon DC locator behavior. If the DC in my client site (and DCs which Auto Site Cover are considered to be in my site) fail then we fallback to query the generic list of DCs for the domain (_ldap._tcp.dc._msdcs.domain.name)

So in order to resolve this situation we have the long ago mentioned solutions:

 

DnsAvoidRegisterRecords

As explained in the Branch Offices Guide (http://technet.microsoft.com/en-us/library/cc749944.aspx) we can prevent the domain controllers at the branch site from registering the generic SRV records.

The recommended configuration in a branch office deployment is as follows:

  • On all branch office domain controllers, add all entries that do not have "AtSite" as part of the mnemonic, to the value of the registry key, except the DsaCname.

On hub domain controllers, do not use the registry key. This allows the domain controller to register all records.

Creating the registry key on the branch DC (child-dc03) with the value of:

Dc
Gc
Kdc
LDAP
Rfc1510Kdc
Rfc1510UdpKdc
Rfc1510Kpwd
Rfc1510UdpKpwd

image

Would result in having only the Hub and DR DCs (child-dc01 and child-dc02) listed in the generic list of Domain Controllers for the domain:

image

Which leaves the client with only one option if the Hub site DCs fail… the DR site DCs.

 

That's it for Part 1. In Part 2 we'll talk about failover scenarios for client sites which DO contain Domain Controllers, but those have failed.

Part 2 - http://blogs.technet.com/b/isrpfeplat/archive/2011/12/04/disaster-recovery-site-and-active-directory-part-2-of-3.aspx

Michael.

 

Comments (6)

  1. michdu says:

    Hey Dani,

    By Default (and in older operating systems than Windows Vista) the Try Next Closest Site feature doesn't exist for netlogon (as opposed to DFS with the SiteCostedReferrals you mentioned), meaning that the client will always fallback from it's own site SRV record list in DNS to the generic SRV record list in DNS.

    The client has no knowledge of the organizations site topology and costs, and hence cannot "decide" which is the closest site, and if it's not available what's the next closest site and so on.

    Refer to the next post about the Try Next Closest Site feature introduced in Vista which would allow the clients to fallback from their site to the next closest site (based on site link costs).

    And Yes, organizations which have firewalls between the different sites in their topology should consider the design of their Active Directory for exactly that.

  2. Mike Kline says:

    Really good series so far Michael…nice work.  You are right about post 9/11 DR planning and COOP sites etc.  There have been hundreds of millions of dollars (maybe more) spent on hardware, labor, and planning.

    Hopefully these sites are never needed.

  3. michdu says:

    Thanks Mike… Indeed hopefully those are never needed…

  4. Anonymous says:

    Hi Michael,

    I've read the post and I have a question regarding the fail-over process of logon requests.

    If I'm not mistaken, during the logon process the client queries the DNS (after obtaining it's site) with a site specific query to receive the most appropriate DC to his location (I'm talking about first logon). Considering this, what kind of response will the client get if the "Closest DC" is unavailable? Won't the client try to query for another DC that would be next closest by cost? Does the "SiteCostedRefferals" value has anything to do with it (I know it's used in the Namespace context, but hey, it might have something to do with it)?

    In an environment with a 2003 functional level this scenario seems "problematic" to say the least. If the Firewalling in the environment is designed according to site topology and clients are blocked from accessing sites they're not supposed to access this could be a huge problem.

    Thanks.

  5. michdu says:

    A good design is not a "workaround" but should incorprate a solution for this type of scenarios.

    Knowing Israeli customers the questions on the "not supposed to access" part in your sentence comes to mind.

    Who decided they're not supposed to access? Was there a real threat that you were trying to protect when blocking access? as opposed to some firewall guy's new years wish for 50 new rules in the firewall that got the sites blocked from each other…

    Give me your threat analysis and convince me these sites should be blocked from each other and I'll give you a solution in Windows XP/2003 time as well, agree?

    P.S – let's continue this offline 😉

  6. Anonymous says:

    I've already read the other parts.

    It seems there's no apparent workaround for this issue prior to Windows Vista/2008, is there?

Skip to main content