How to detect applications using "hardcoded" DC name or IP?

You look at Windows Server 2012 R2 and you tell yourself: "that would be nice if I could leverage all those new features". Then you remember...

  • Adding new domain controllers is usually not a problem. Besides, if you want to add your new DCs in a smooth way, without impacting the existing environment, you can follow this excellent post which, despite its age, is still valid for Windows Server 2012 R2: Minimizing Risk During AD Upgrades.
  • Removing the old ones is what you are worried about. "What if I have applications using explicitly one specific domain controller's name or IP?" Well, unless you reuse the same name and same IP address for your new domain controller, it might break things. And breaking things isn't fun...

How can we do it without breaking things?

First, it is important that all applications consuming Active Directory data (for authentication as well as for data storage) are configured in a way that they are not bound to a specific DC. Being proactive means two things:

  1. Communicate and educate the applications' owners about the magic the NetLogon service does. If possible, craft the list of all business critical apps, sit down with the team in charge of administering them and try to determine how their apps are discovering domain controllers.
  2. When acquiring new software, ask the vendors if their applications are discovering a domain controller through the Windows API or if they require a hardcoded configuration. And be careful! Specifying the FQDN of the domain name might bring some flexibility but does not necessarily imply that the applications use Windows API to discover domain controllers. We'll discuss about it later on this article.

Second, we can try to detect which applications are using this kind of hardcoded configuration. This is a tough one. You cannot just look at the logs of the domain controllers because the decision of using a specific DC is done on the clients' side. So enabling LDAP logging will just basically list all your active clients without the possibility to distinguish if it comes from a hardcoded app or a regular Windows client. When replacing a DC with a new one with a new name, you might be tempted to create a DNS alias to point to the new DC. It might do the trick for the application but it's in fact just punting. You will have to maintain the DNS record. However some functionalities such as LDAPs or Kerberos could go bad with this DNS spoofing workaround. It looks like a goner...

The ldap://contoso.com illusion

It is actually also true for \\contoso.com but less relevant for the purpose of this post. When we are using the FQDN of the domain name in a connection string for an application, we could assume that we are relying only on the contoso.com DNS resolution and therefore performing a simple A lookup of the domain name (resulting of a round robin of all DC registering their LdapIpAddress record). It is not the case. Well, not always. On a Windows client, when doing a [ADSI]"LDAP://contoso.com/DC=contoso,DC=com" in a PowerShell console for example, the ADSI component, like other Windows LDAP clients are leveraging the DsGetDcName function to get the closest domain controller. It will not use this <same as parent> record that you see in your DNS console.

Give it a try:

  1. Empty your resolver's cache: ipconfig /flushdns
  2. Start a network capture
  3. Run the following command from a newly opened PowerShell console: [ADSI]"LDAP://contoso.com/DC=contoso,DC=com"

What Are you seeing? That you are using the classic DCLocator discovery mechanism described here: https://technet.microsoft.com/en-us/library/cc759550(v=ws.10).aspx section Domain Controller Locator. So it will leverage SRV records and not the (same as parent folder) thingy you would expect.

"And what? Even if I was using the (same as parent folder), at the end I find and use a domain controller". Sometimes the netmask ordering was generous with you and gave you a pretty close target (see here for more info: https://support.microsoft.com/kb/842197 note that this behavior is pretty much the same for more recent versions of the OS). The problem is that you might think that because AD is highly resilient, if an application is using Active Directory pointing to its FQDN, your app inherits of that resiliency property. This is not always true. If the application is levering the Windows API to find a domain controller it is fine, if one domain controller goes down, it will find another one (there might be some timeouts depending on the app but they should be manageable). However if the application is relying on the DNS round robin of the FQDN of the domain, and the DC the app is currently pointing at goes down, because of the DNS cache, the app is likely broken. I will write another post about it. For now, I just want to bring awareness on that problem in order for you to make the verification on your apps.

Hope

Well, it's not really hope, it's more about the method. You know that Windows clients are leveraging the DCLocator process that we just talked about. It means they are using specific DNS records to localize domain controllers. Those SRV records are registered by the NetLogon service of each domain controller (there are actually also a few A records recorded by the NetLogon service such as the (same as parent folder) or some GC related ones). Without those records the DCs cannot be localized therefore cannot be used. And THAT is the trick. Here is one method:

  1. Add new domain controllers in your environment (same OS version or new OS version if you are confident about application compatibility).
  2. Mask the old domain controllers in the DNS, it means remove everything registered by the NetLogon service (well not everything, the GUID records are used for the replication, so we must keep this one).
  3. Wait until the clients' caches expire and TADA! Every LDAP query you see reaching the masked DC, every authentication request is from applications and servers not leveraging the DCLocator and eventually having a hardcoded configuration. Because the hidden domain controllers are still running and replicating, it does not affect the hardcoded applications in using them.

Step by step, ooh baby

Of course, as usual, make sure you understand everything in this article and that you have a valid backup and test that in your lab first! If you are not feeling it, ask for assistance or even better: ask for a PFE!

  1. Adding new domain controllers 

    I don't think I need to describe that one. You just launch a bunch of dcpromo, ideally you add a new domain controller for each domain controller you are planning to hide (dcpromo is a new nostalgic way, this time from the 2000s to say deploy and configure the Active Directory Domain Services from your Server Manager console). If you do need assistance to add new domain controllers, well you better stop here and ask for external assistance.

  2. Mask the old domain controllers

    Be careful! If by mistake you're hiding almost all or all your domain controllers, you might cause a serious outage! One way to do it is to create a group policy and link it to the domain controller OU.1. Create an empty group policy and disable the user configuration settings.

    2. Remove the Authenticated Users from the security filtering. Instead we will manually add the computer account of the domain controllers we need to hide. "Why not create a group and add those DCs accounts in that group?". Group membership change will require to restart the domain controller we want to hide. Because every domain controller is potentially suspected to be hardcoded somewhere, you want to avoid any sort of service disruption. You don't have to go all in. It is even better to go slowly, starting with one or few domain controllers. It will take time to parse the logs anyway. Why don't you start with those domain controllers that you have been keeping the name or IP (or both) of for a few years?

    3. Edit the group policy and find the following parameter: Computer Configuration > Policies > Administrative Templates > System > Net logon > DC Locator DNS record > Specify DC Locator DNS records not registered by the DCs. Enable this parameter and in the field you have to type all the records that you don't want to see in the DNS (those keywords are explained here: https://support.microsoft.com/kb/306602). So type the following (the separator is a space character): LdapIpAddress Ldap LdapAtSite Pdc Gc GcAtSite GcIpAddress Kdc KdcAtSite Dc DcAtSite Rfc1510Kdc Rfc1510KdcAtSite GenericGc GenericGcAtSite Rfc1510UdpKdc Rfc1510Kpwd Rfc1510UdpKpwd. Do not delete the DsaCname, this is used for the replication.

  3. WaitYou have several things to wait for.

    1. If you have a multi site environment, you have to wait for a replication convergence. The group policy needs to be replicated on the affected DC to be effective.

    2. The group policy refresh interval is every 5 minutes on the domain controllers (unless it has been changed in your domain). So you have to up to 5 minutes to get the setting applied.

    3. Then the NetLogon service refreshes its records potentially every 24 hours (it actually updates its records 5 minutes after the service started then double the time, 10 mins, then 20 mins and so on as long as there is no error and to the point it reaches every 24 hours).

    4. Then for the clients using the FQDN of the domain and not leveraging the DCLocator, you have to wait until the TTL of the records expired. By default it is 10 minutes.

    5. Then you have to wait until all Windows clients pick other domain controllers. By default, since Windows Vista, the clients will rediscover a domain controller every 12 hours. So you have to wait 12 hours. You still have Windows XP or Windows Server 2003? This is a tricky one, if you have deployed the KB 939252 you wait 12 hours. If you haven't deployed it... Well the XP/2003 client will not refresh its domain controller selection unless the currently selected domain controller isn't reachable, or you restart the machine (actually just restarting the NetLogon service will be enough). Your machines will restart at one point because of updates and software management, so you can also wait until the next cycle.

  4. Enable and collect the logsLet's focus on the LDAP logging. You need the time and the source IP of each call. Set the following registry value to 5: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Diagnostics 16 LDAP Interface Events. Note that you could also choose to do that in the group policy that we have created and using the Group Policy Preferences feature to modify this registry value. From now on, every LDAP call is logged in the Directory Service event log. We will have to look at the eventids 1138 and 1139.

    Note that you also have the security context, here it is the builtin Administrator doing the call (this is the SID finishing with -500). You will see the type of LDAP operation such as ldap_search, ldap_bind... Do not rely on the parsing of the GUI, it sometimes gets things wrong as you can see:

    This can generate a lot of logs. So you might consider changing the default size of the Directory Service eventlogs to something way larger (as long as you stay under the recommended limits which is really a problem only for Windows Server 2003 domain controllers: https://support.microsoft.com/kb/957662). Or you can also remotely collect the logs by script at short intervals. I can share some code for that if you think it might be useful.

    You can also open perfmon and look at how many authentications per second and LDAP queries per second you still have on the domain controller to give you an idea of the number of requests still arriving on these domain controllers. Once you think you addressed all the servers and apps you were seeing in the logs, you can maybe let this perfmon run for a little while. Other domain controllers are still using hidden domain controllers therefore you will still see some authentication stuff happening (replication, monitoring apps and other internal calls).

Here is a simple PowerShell script you can use to list all the IPs:

Get-WinEvent -ComputerName dc01.contoso.com -MaxEvents 1000 -FilterHashtable @{LogName="Directory Service" ; ID=1139 } | ForEach-Object `{        $_info = @{            "Operation" = [string] $_.Properties.Value[0]            "User" =  [string] $_.Properties.Value[2]            "IP:Port" = [string] $_.Properties.Value[3]        }        New-Object psobject -Property $_info     } 

It's very basic, and you can improve it your way (like filter out the 127.0.0.1 IP for example). You can also use the Windows firewall logs on the DC:

And then parse the logs with let's say PowerBI :)

Wait a minute, I recognize these IPs, it's Exchange!

It seems that Exchange Server 2016 does not solely rely on DNS to find domain controllers (assumption is that it list them from the configuration partition too), so if you have Exchange servers in the site, filter them out from the report you are generating.

What about NetBIOS name resolution?

Really? You also have WINS in your environment? And you are scared that your app is not only hardcoded to use the NetBIOS name of the domain but also relies on NetBIOS name resolution to discover a DC? I am sure you only wish to get rid of WINS! I will discuss this in another article. In the meantime, just make sure that the 1C record are not listing the domain controllers you want to hide. Ping me if you want more details.

PS. Special thank you to my favorite Princ.: R. Stampfer for its pertinent comments 👩‍🎤