Optimizing your network to keep your DNS squeaky clean

I have run into this issue several times: My customer has a fairly large network, with several subnets. They have wireless clients unplugging from Ethernet and dashing to conference rooms, then closing their laptop and heading home. The client starts a VPN session and continues working… Then, their spouse needs to use the computer, so the VPN connection gets dropped. They reconnect again later.

So, how many DNS records does that make? Lets count:

The Ethernet connection from the office has one. The Wireless connection may have several, depending on how many subnets the worker crossed going to that conference room, and the VPN may have several of them. 5-6-7 ? 8? 10 Addresses? And there are addresses from their home network as well. Ouch. Which one does the machine really have? And what if they stop at a local coffee shop on the way? They may have an address from the coffee shop’s subnet as well. That’s a load of stuff to pick through. All validly registered, but only one address really works.

The worker calls the helpdesk to get assistance with some application on the machine. The help desk tries to connect, but finds that they are hitting someone else’s machine.

Time to call IT to get things cleaned up so the worker can do their job. IT has to open up the DNS console, find all the records for that machine, figure out which is the most recent, and delete the rest.

One of my customers was getting calls like that 8-10-20 times a day. It was bad enough that they got orders to ”Fix that problem!”

That’s where I came in. Since this is a production network, we had to carefully do this step by step, avoiding any disruption to operations.

This sort of problem requires a system-wide fix. Just turning on DNS Scavenging won’t do it. We need to actually manage the address space properly. And do it in a way that doesn’t stop business.

Step one: Enable scavenging

Before starting anything, we made sure that scavenging was not enabled on ANY of the DNS servers in the environment.

I sent the customer info on scavenging – http://blogs.technet.com/networking/archive/2008/03/19/don-t-be-afraid-of-dns-scavenging-just-be-patient.aspx – which covers the basics. It lists the DNSCmd operation required to reset scavenging on all servers for individual zones. We enabled scavenging on one server, then turned that server’s scavenger off for the first step in the process.

Machines refresh their host records every 24 hours. Because of that, we really don’t want to lower the scavenging period below 2 days to avoid touching all the machines. Servers have to stay visible. Also, DNS time stamps are not updated in Active Directory until you turn on Scavenging on a zone. So, we set the no refresh/refresh interval to 2 days each, and “enable” scavenging on each of the zones where the user may appear (including reverse lookup zones.)

We let this run for several days to verify that time stamps were indeed being updated. (Slow and careful, like I said. We don’t want to lose the static records and so on, so careful analysis of the contents was done.)

A quick side note: Using the Windows Server 2008 management tools provides a much easier way of looking at the time stamps. They are displayed without having to dig into individual records, and makes the job easier.

Once it was clear that everything was updating properly, we pulled the trigger on the server selected to do the scavenging. We gave it a period initially of 3 days. So, over a weekend, it scavenged, and removed a whole bunch of old records.

Step 2: Adjust DHCP credentials and configuration

The next step (done in parallel with the above operation) was to setup a dnsupdateproxy user on all the customer’s DHCP servers. We dropped that user into AD and started testing. Knowledge Base article 932464 describes the cleanup interval used. KB article 837061 goes over the processing of expired pointer records. We bumped the queue length up to the maximum to make sure that the DHCP server could do the updates quickly. We also configured DHCP to always register the clients, and delete the records when the lease expired.

The next step was to lower the lease period dramatically on the wireless and VPN subnets. The customer’s VPN solution was a major brand name concentrator, which used IAS for user authentication. We dropped the leases for those pools down to 3 hours. That means that after 3 hours, the records will vanish from DNS if the client has not renewed the lease.

Step 3: Apply group policy

The company had specific OUs in AD for wireless and VPN users. We applied a group policy to prevent registration of these records in DNS. The reason for this: some clients were still registering records in DNS that should not be. VPN users needed to “log on to domain using dial up networking” to get the group policy applied. That also allowed distributing a registry change that would disable that registration in the future, as described in KB article 246804.

The group policy changes we made are shown in the screen shot below. This policy was applied to the OUs where the wireless capable machines were located, and in the VPN users OU. It would be possible to apply this to all client machines, but you would still need to have the servers refreshing their host and pointer records.


The Results?

The first week that the final changes were applied, the customer’s calls to helpdesk for this issue were down to one per day. After they found that there were work at home users that were not covered by the policy, they added that OU to the group, and there were no calls the next week.

In short, taking a bit of time to do this methodically netted a good result, and lower maintenance costs long term.

– Bill Blomgren