Hey everyone! PFE Tim Beasley here coming to you live from the warm, cozy sands of Bora Bora…Pfft yeah. I wish! … No I’m in Missouri…where it’s miserably winter outside. But I digress, I am writing this post to hopefully shed some light on a bizarre issue I recently faced at one of my dedicated (DSE) customer sites. I’d like to consider myself one that’s experienced as I’ve been working with Microsoft technologies for over 25 years now. Yet, I for one have never encountered this particular situation, so sit back, grab some popcorn, and hold on…because this is about to get real people.
During one of my regular DSE visits to my client, I was following up with what occurred that caused a Severity A support case to be opened. While gathering information and details, I was told “We had static reverse DNS records vanish.” I was like…Say what? Huh?! How in the world do STATIC records just vanish without someone deleting them?! Needless to say, I had them walk me through every step they took from beginning to end…and not one mention of someone deleting static records. Yet they swore up and down (literally) that they bloody well vanished, which is why they had to restore the original reverse zone from backup.
Okay, I’ll be your huckleberry…
Imagine finding yourself as an IT administrator faced with over 50,000 reverse DNS records that are placed comfortably in one single, large, super zone. For example’s sake, let’s say it’s 10.in-addr.arpa which happens to be an AD-integrated zone. Normally this is totally fine and actually recommended to do from our standpoint as it’s easier to manage. (Here’s a blog post on how to consolidate multiple reverse DNS zones by “GOATEEPFE” Ashley McGlone, in case you’re interested.) However, a decision is made to break up that super zone into smaller reverse zones for reasons that are, well, whatever that reason may be.
There’s a maintenance window coming up, and you’re probably thinking…”Okay, let’s create some smaller AD-integrated zones of the larger one.” But being a safe IT admin, you want to make sure you have a rollback plan in the event something unexpected happens, as there’s a lot of applications/devices out there that rely on reverse DNS. What do you do? You want to take a backup of the existing super zone before you start? Good idea to be safe. Also, you think to yourself…”I’ll just create the smaller zones, and leave the big one too…that way I can simply delete the zones I create if something goes wrong in the event I need to revert my changes.” -The plot thickens…
Now it comes time for the actual work to be performed. New zones are created to match existing network blocks (let’s say 50 of them or so), and the original 10.in-addr.arpa super zone is left intact. You now watch some of the new zones start to get populated with reverse DNS records (PTRs) as registrations are renewed. You think this is a “mic drop moment”…and walk away exclaiming “SUCCESS! We did it! Pats on the back all around!” –What really happened in this particular case, is a little bomb just got triggered for devices and applications that rely on reverse DNS records, that just so happen to be statically configured…*gasp*
Shortly after you head home for the night, reports start coming in of some devices and applications aren’t working. (Imagine that.) Some initial investigation reveals that the devices/applications that are failing rely on reverse DNS records. Now what? Rollback plan! “Well, let’s undo what we did and go back to the original 10.in-addr.arpa zone that’s still there!” “Sounds good!” “Okay go!” You then begin to remove all the reverse DNS zones that were created, and a sigh of relief is had by all.
But wait, problems still exist! The same devices and apps still aren’t working? Say it isn’t so! The reverse zones were deleted from the environment, looking at the DNS management console you can see they are gone, AND you can see that the original zone is there. You undid what you originally did! You even have other machines working fine and able to do reverse lookups without a problem. What is going on here?! So why are those certain devices and apps not working as expected? You go look in the original reverse DNS zone of 10.in-addr.arpa…and the STATIC reverse DNS entries that correspond to the same devices/apps…are…gone. *gasp again*
Baffled beyond belief, you question why? “How is this happening? What do we do now?!” Okay calm down…you remember the backup of the original reverse zone you took initially? Lightbulb! “Let’s just restore the original zone file, cycle services, and the records will come back.” Believing you know the correct method of restoring an AD-integrated zone, you then stop the DNS server service on one of the DCs, copy the backup file to C:\Windows\System32\DNS of a DC, rename it accordingly to 10.in-addr.arpa.DNS and then start up the DNS server service again. You go look at the zone in the DNS management console and the records aren’t there. You hit refresh. Again…and again…still not there. ACK! Panic temporarily ensues, judgement is clouded and you cycle DNS services again a few more times. Records are still not there! However, event logs show 4004, 4013, and 4015. But quick research shows those can be safely ignored from some online posts. (hint: that’s not the correct method for restoring an AD-integrated zone…the correct method is near the bottom of this blog.)
Ready to call for help? Or do you savvy IT/DNS admins out there think you know the answer and what to do at this point? *grin* My customer ended up calling into support at this point and opening up a SEV A case as multiple services were impacted. After many hours on the phone, CSS was able to finally get the records that were stored in AD, along with the backup file, to repopulate the original zone file across the DNS servers. However, the damage was done. From what I gathered, some static entries were also tombstoned at some point in time as well. By the time everything was restored, my customer experienced over 42 hours of downtime for those reverse records, which meant some services were hindered. My customer was lucky it only impacted a certain number of applications, but it could have been much, much worse had they experienced an enterprise-wide DNS failure.
If you’ve read this much you’ve noticed I still haven’t revealed any answers yet. Hopefully you haven’t ever experienced anything like this…but if you do, keep reading to figure out how to avoid some panic. Or, if you’re just wanting the nitty gritty, skip to the bottom. Let’s get to it!
The initial mistake here is believing that the original zone file had to be broken up into smaller ones. Had they followed recommended practices, this entire debacle could have been avoided. Point is, if you have a large super zone for reverse DNS records…leave it alone! And if you have tons of reverse zones, look to consolidating them following this. But if you insist on breaking things up into smaller reverse zones, you should watch the way you do it, especially if static records are involved. Okay, off my soapbox now.
I took the liberty of picking my jaw up off the floor when the customer told me what happened and everything that they went through. I had them send me the original zone backup file they used as well thinking there might be something strange in it. So, I had the zone backup file, along with my trusty lab machine, and got to work. Here come the screenshots!
Now let’s examine some of the details here. After all, the devil is in the details right? Make note the original 10.in-addr.arpa zone is an AD-integrated zone. No problem there, but as we know zone records for AD-integrated zones are stored in various AD partitions depending on how they are configured.
- Default Domain partition : “All domain controllers in the Active Directory domain”
- DomainDNSZones partition (Application partition) : “All DNS servers in the Active Directory domain”
- ForestDNSZones partition (Application partition): “All DNS servers in the Active Directory forest”
Additionally the new reverse zones created are also AD-integrated. Again, no issue. Or, is it? It is when it comes to the recovery method mentioned in the scenario above…I’ll get to that shortly. Remember the entries that disappeared were STATIC entries. Meaning someone manually created them in the original reverse DNS zone, and static entries are always well…static. That said, there are multiple ways various types of records can mysteriously “poof” away, such as duplicate zone creations, misconfigured scavenging settings, etc. (read more here) but this little particular nugget appears to encompass something entirely different. And so, I followed their described steps my customer took during this unfortunate event in my lab:
Please note my customer’s environment is WS2008 R2, but I used WS2012 R2 in these screenshots, however I also did these same steps in my lab using WS2008 R2 and the results were exactly the same.
I began with creating a fresh 10.in-addr.arpa reverse zone that is AD-Integrated, set to replicate between all DCs in the DOMAIN (DomainDNSZones partition in AD), along with a few static entries:
And for good measure, here they are reflected in ADSIEdit:
Okay good. That’s done, no problem there. I can do reverse lookups and records resolve no problem. Next, I’ll simply do what my customer did, and create 3 reverse zones that correspond to the subnets of the static entries I created.
This is the point in the scenario where it became a “mic drop moment” and the IT crew left the building. All looks good right? At initial glance, you might think so. But let’s take a closer look…
In each of the “new” reverse zones I created, you will see empty zones illustrated in the below screenshot. Each one only contains an SOA and NS record…and that’s it. Oh, they will get populated with PTRs when clients start to re-register up, but until that happens, they’ll remain empty.
Now here comes the pain!
Check out what the original 10.in-addr.arpa zone file looks like in DNS manager after a refresh…
Where in the world did the static entries go?! Hmmm, what are those folders? Are they in those little subfolders that got created?? Let’s look…
Ahhh…what about in AD? Surely they are in there! Take a gander…
Hurray! There they are along with the new zones…but…how come the static entries aren’t in DNS Manager? Also, when testing reverse lookups now, things are failing! The static entries that were resolvable before, are now no longer able to be found by the system even though they are in the AD partition. Chaos ensues….
Okay, let’s undo what I did initially and simply delete the reverse zones, and try reverse lookups using nslookup again:
Still no dice. But the reverse zones I created earlier are gone, effectively undoing what I did before right? Or…is it?
Diving a little deeper into the situation here, let’s cycle the DNS server service just to kick it and see if that helps. Hmmm….nope. Same result, no reverse lookup resolution. Let’s check event logs…DING! A CLUE!
Thankfully we have our first insight as to what’s going on. Event 4010…The system can’t create a resource record for the missing static entries. Wicked. You might be wondering why? The new reverse zones are there that correspond to the static entries that were created. And, the static entries still exist in AD. However, this particular event error indicates that ADDS isn’t responding to requests from the DNS Server service. Some of you might have come across this little nugget when migrating the _msdcs zone during a domain upgrade…(sound familiar?).
However, in the situation above they witnessed events 4004, 4013, and 4015. More often than not, this indicates that the “preferred” or primary DNS server in TCP/IP properties of the NIC on the DNS server (or DC) is pointing to itself. Ultimately when services start, AD can’t start because it’s hung up waiting on DNS to start, and because AD isn’t running, DNS can’t load the zones from AD. Ugly cycle really and causes unnecessary delays… It’s a good practice to configure all DCs to use the PDC emulator as primary DNS server (at the very least another DC other than itself), and then itself as secondary to avoid that. This is why in my lab environments I never saw the same events my customer did, as I followed this old practice, which happens to streamline the DNS infrastructure and allows for easier troubleshooting (not to mention decommissions and additions to the infrastructure). You will hear various ways that you should configure your DNS infrastructure, but I try to look at DNS with the KISS philosophy, because overly complicating things unnecessarily can turn into a quick mess.
Now, each of those subfolders in the original 10.in-addr.arpa zone are sometimes referred to as “delegated subfolders.” Take notice they got created the instant I configured the new reverse zones. They represent what servers have the authority or permission to create records. If you scroll back up and look at the figure that shows the contents of the subfolder, you’ll see a single NS record of the DNS server I used in the lab. Great! Now what? Well, what happens when we delete those delegated subfolders and cycle the DNS Server service? Hold on to your seats!
Look! The 4010 errors are clear…and…dun dun dunnnnnnnnnnnn!
The static entries are back (pulled from AD no less), reverse resolution is working again, and everything is hunky dory once more!
BUT WAIT! Hold the phone…this is not how the scenario above was described!!!
Exactly! This is how I came to resolving the problem in my lab environment, and frankly what my customer should have done to correctly rollback their environment too. What my customer ended up doing compounded the problem significantly. I’ll explain. When they deleted the 50 or so reverse zones from DNS, that’s all they did before trying to restore their original zone from a backup. No one bothered to look at the subfolders that got created when they built out the additional reverse DNS zones! Additionally, they could have also avoided this “big nasty” had they manually recreated the static entries in the new zones without deleting anything. But that would have required some due diligence and a thorough discovery first before making drastic changes. Hint Hint! Don’t let someone outside your org that doesn’t know the environment implement major config changes without knowing exactly what they are getting themselves into…Yes, I’m one who tells it like it is. J
When you have reverse DNS zones that are smaller, aka more specific to a smaller subnet, like a /24 vs. a /8 subnet…then the DNS server will process name resolution requests to those more granular zones vs. the larger one. Plus, the delegated subfolders that get created refer clients back to the specific nameserver(s) that manage those subzones. Notice that there was only one NS record in the subfolder above in Figure 6? This means that when those new reverse zones were created to break up the larger one, the DNS server would process lookup requests by the referrals from the delegated subzone record to the NS server listed there, then on to those newer, more specific zones. Once it got to the newer zones, as the static records weren’t there…then reverse lookup fails. Hopefully that makes sense to you all.
Feel free to lab it up on your own and test various scenarios. Watch how the simplest action can either save or wreck an environment. For example test this in your own labs…Create a static entry that is in the original 10.in-addr.arpa. Then manually create a newer reverse zone…and then go delete the delegated subfolder that was created in the 10.in-addr-arpa zone. Does name resolution for that entry still work? Yep, as it should…but what if you delete the newer reverse zone again? Reverse resolution now fails for that entry, AND…the static entry won’t show back up in the 10.in-addr-arpa zone within the DNS MMC! But it’s still there in the AD partition if it’s an AD-integrated zone! At this point however, the original static entry is now appears to be TOMBSTONED in AD…see?
Don’t let this fool you however, as it’s not actually AD tombstoned. The dNSTombstoned attribute means that the record was deleted from the DNS Management console MMC or simply scavenged, yet the object still exists in AD. However, DNS.EXE will no longer load the record. It’s basically giving the appearance the object was deleted from the MMC, but the reality is it was only hidden from DNS.EXE. If you see the “isDeleted” attribute containing information, then that means it’s actually tombstoned in AD.
You can also refer to this TechNet article which shows you how to track for deletion of DNS records for a more proactive approach to your environment and also quotes the following:
“When Active Directory deletes an object from the directory, it does not immediately remove the object from the database. Instead, Active Directory marks the object as deleted by setting the object’s isDeleted attribute to TRUE, stripping most of the attributes from the object, renaming the object, and then moving the object to a special container in the object’s naming context (NC) named CN=Deleted Objects. This object is called a tombstone and is used to replicate the object’s deletion throughout the Active Directory environment. Over time (default 60 days), the tombstone is removed and the object is truly gone from AD. DNS objects, however, have their own process of deletion – once the DNS zone is integrated in the Active Directory, all the DNS records become Active Directory objects but they get an attribute called “dNSTombstoned” attached to them.
A DNS record gets removed by either of the following methods:
- Manual deletion
- When it gets a valid TTL update with TTL=0
- An LDAP delete command using interfaces such as ADSIEDIT or LDP
If the DNS record is getting deleted by any of the first 3 ways then the value of the dNSTombstoned attribute attached to it will become “TRUE”. In this scenario the records will still exist in Active Directory but DNS.exe will not load them in the MMC. This is because for DNS they are deleted, but for Active Directory they still exist as a valid AD object. We can still see them using ADSIEDIT. When the record is in this state in the Active Directory the value of dNSTombstoned can change to “FALSE” either when the host machine/DHCP sends an update for the record or by creating another record with the same name manually. When this happens, DNS.exe will start loading the record again in the MMC. If the DNS record is being deleted by the 4th method or if the record stays in the state of dNSTombstoned=TRUE for more than 7 days then it will be tombstoned (AD tombstoned) like any other AD object.”
I know what you’re probably thinking, I thought the same thing…”Can’t we just manually change the dNSTombstoned attribute back to ‘FALSE’ and it’ll reappear in the MMC?” Well for grins I tried it myself, and the answer is NO. To get it working again, the record must be either restored from backup, the machine/DHCP sends an update for the record, or the record must simply be manually recreated. Manually recreating the record triggers DNS to update the record attributes in AD. Only then will the value return as FALSE and show back up in the MMC of DNS Manager.
PRO TIP: If the DNS record is either “dNSTombstoned” or AD tombstoned (aka “isDeleted”), then you can use “repadmin /showobjmeta,”
which will show you the time/date that each attribute for the object was created, edited, or marked for deletion. This also shows the originating source DC of this change. Handy little command when troubleshooting.
This is starting to now look like what happened at my customer’s environment based on what information I could gather, as unfortunately I wasn’t directly involved.
Now let us skip to the “recover from the backup” part that was mentioned in the scenario. The method they chose for recovery was another mistake to try for an AD-integrated DNS zone. AD-integrated zones don’t pull the records from a file, they pull them from AD. Simply stopping the DNS Server service, placing a .DNS file inside the C:\Windows\System32\DNS directory and restarting the service will NOT work when you’re talking about AD-Integrated zones. Even running DNSCMD commands to add a zone with the /dsprimary flag ignores the files as well. What WILL work with an exported DNS file is creating a new zone using the DNS file as a standard primary zone…THEN converting it to an AD-Integrated zone afterwards.
Related to the scenario above, there were several recovery options that include but are not limited to:
- Delete the existing reverse zones from DNS and AD…all of them…restore the original backup file to a new standard primary zone, validate the records were all there, convert it to an AD-Integrated zone…and then wait for replication to complete. (In my lab with their backup file of the zone using WS2008 R2 it took roughly 37 seconds to replicate all 50k+ records with 2 DCs/DNS servers configured with minimal resources.) This depends on convergence time in your environment, server hardware, etc.
- Restore from a system-state backup using Directory Services Restore Mode if DNS is running on a domain controller. Unless of course there is no valid backup…hopefully that’s not reality for you.
- Manually recreate the missing static records in each of the new zones…this of course assumes you have the details of each missing record from the due diligence I hinted at earlier…which wasn’t the case…and it’s also time consuming.
To sum things up, this unfortunate scenario that plagued my customer for well over 40 hours could have been avoided from the get-go. Again, if there’s a large super-zone there’s no need to break it up. However, if you’re facing a potential resume generating event, know this: at the heart of the issue lies delegated folders that get created automatically when you try and split up a larger zone into smaller ones. The creation of this delegation record and its affect is not at all obvious. Most DNS admins are used to creating delegation so it’s odd that it shows up all its own. Then quick course to resolution is to delete the subdomain, delete the delegation and reload the zone.
So, if you have decided to try and break up a super-zone and have issues…first verify that the delegated subfolders got created in the main AD-integrated zone after you added smaller AD-integrated zones. Delete the subdomains you created, delete all the delegated folders that got created, and reload the original zone. If some records are missing from DNS management console, then verify they exist in AD. If they do exist in AD, you might have to wait a bit for them to show back up in the DNS console. If they are missing entirely, then I would go down the road of using the backup file of the original zone. If you don’t have a backup file, you’re then limited to a Directory Services restore, or manually creating static records.
I hope that this blog post helps you all out there don’t fall into this trap…but if you do find yourself amongst your peers freaking out about vanishing static reverse DNS records, now you can calmly reply “I got this” and be the hero. Thanks for reading and have a blessed day!