So in Part 2 of the series we're going to discuss the following scenario:
The challenge in this scenario is to make the clients in the different Branch sites to failover to their closest Hub sites.
It's obvious that HUB is the best option for failover from Branch when the DC in branch (Child-DC03) fails, and HUB2 is the best option for Branch2 when the DC in Branch2 (Child-DC04) fails.
The problem is, as we learned in the previous post (http://blogs.technet.com/b/isrpfeplat/archive/2011/12/01/disaster-recovery-site-and-active-directory-part-1.aspx), that the client doesn't really care about sites and costs, all it cares about is DNS.
So in this scenario the client would query the generic list of SRV records for the domain (_ldap._tcp.dc._msdcs.domain.name) and will be able to use any of the DCs.
So why not the DNSAvoidRegisterRecords?
The problem is that in this scenario we can remove Child-DC03 (Branch) and Child-DC04 (Branch2) from registering in the DNS by using the DNSAvoidRegisterRecords solution, but… we would still always have Child-DC01 (HUB) and Child-DC2(HUB2) since those are located in HUB sites, and we want the clients to be able to find them no matter what. Plus if I remove them, which one to remove? The DCs in site HUB? That means clients in the Branch site would always login to the DCs in HUB2 and vise versa.
So how can we make the client actually figure out the best option for him if the DC in it's own site fails?
The solution is…. (drum roll).
Try Next Closest Site Feature
Warning: If you're not running 2008 DC/Vista clients or later you can stop reading and use the time to start planning your upgrade…
As described here - http://technet.microsoft.com/en-us/library/cc733142(WS.10).aspx
The behavior of Try Next Closest Site is:
- Try to find a domain controller in the same site.
- If no domain controller is available in the same site, try to find a domain controller in the next closest site. A site is closer if it has a lower site-link cost than another site with a higher site-link cost.
- If no domain controller is available in the next closest site, try to find any domain controller in the domain.
Important note: The NextClosestSite feature is redundant to the AutoSiteCoverage feature. If the site where the client is located is Auto Site Covered by DCs in another site the Domain Controllers performing the AutoSiteCoverage are already in the NextClosestSite.
So basically what this feature does is enabling the client to use the NextClosestSiteName attribute returned to the DC Locator query being issued by the client:
As you can see, this example was from a client located in the Branch site, which got the HUB site as it's NextClosestSiteName, the following example is from a client located at the Branch2 site:
Like we expected – the client located in Branch2 is getting HUB2 as it's NextClosestSiteName.
This attribute is returned for every client OS running a version later than 6.0, meaning Vista and later. But… the client will not make any use of this information automatically.
Enabling the client for Try Next Closest Site
In order for the client to actually use the information it receives in the NextClosestSiteName attribute we need to enable the usage of Try Next Closest Site in GPO otherwise it would just revert to the default behavior as we saw it Part1, which is to query the list of Generic SRV records for the Domain.
If you have DCs Running 2008 or 2008 R2 and clients running Vista or Win7, you can just enable this to be the default behavior for netlogon by using a simple GPO setting.
Located under Computer Configuration/Policies/Administrative Templates/System/Netlogon/DC Locator DNS Records. Try Next Closest Site –> Enabled.
This would make the clients looking for DCs always first try to locate a Domain Controller in the nearest site (based on site link costs and topology) prior to failing over to the generic list.
So what actually happens when a DC fails?
In order to test this behavior let's shut down the AD services on the DC in the client's site (Child-DC01) and perform a secure channel reset (by running nltest /sc_reset:Domain.name)
After we have enabled this feature in GPO and applied it to our clients the result of this test would be:
1. The client issues a query to _ldap._tcp.Branch2._sites.dc._msdcs.domain.name (Frame 203). Since none of the DCs in the site are available our client will receive the DCs IP address from DNS (Frame 204) but will not get a response to the LDAP UDP ping packet sent to the Domain Controller (Frame 207)
2. The client falls-back to the generic list of Domain Controllers and issues a DNS query to _ldap._tcp.dc_msdcs.domain.name. (Frame 240). Recieves the IP addresses of the DCs registered in the Domain's Generic SRV list (Frame 241)
3. The client sends the UDP LDAP Ping (Frame 244) to one of the DCs (in our case Child-dc02) and receives a responses with the Netlogon:LogonSAMLogonResponseEX message containing the NextClosestSiteName (Frame 247).
3. The client performs a DNS query based on the NextClosestSiteName information – _ldap._tcp.HUB2._sites.dc._msdcs.domain.name. (Frame 251) and receives a response containing the DCs in site HUB2 (Frame 252).
4. The client sends the UDP LDAP ping to a DC in site HUB2 (Frame 253). The client has received a valid response from Child-DC02 knowing it's in it's closest site. (Frame 254).
5. The client performs the secure channel binding (this what I actually requested to do by running nltest /sc_reset) against a DC in the closest site.
So by using the Try Next Closest Site feature we have enabled our clients to fail to sites which are the closest to them instead of randomly choosing an available DC in the domain based on the generic SRV record list.
That's it for Client logon and failover between sites. In Part 3 we will discuss the Domain Controller replication failover between sites in the different scenarios.
See you soon