Why you can still have duplicate SPNs in AD 2012 R2 and AD 2016

As an AD admin you are probably familiar with the problem of duplicate Service Principal Name (SPN) attributes. Need a refresher on Kerberos and SPN? Read the famous blogpost over at askds: Kerberos for the busy admin. If you have these duplicates, Kerberos fails for the affected accounts. It always fails so there is no excuse for having duplicates around. The only impact of removing the wrong duplicate is that Kerberos starts working for the other account that kept the SPN. It is usually easy to spot which one is wrong: the unused account, or the computer account instead of the service account, a migrated account, a disabled account, etc.

We have known about this problem since the early days of AD. The command line tool to manage SPNs is called setspn, and in Windows Server 2008 we added switches to check for duplicate SPN in the domain and forest. Try it, run 'setspn -x -f' and see what you get. Clean output: no duplicates.

Detection is fine, but prevention is better. In Windows Server 2012 R2 we took that step. For each update to an SPN or UPN we check in the GC if the update would create a duplicate. If not, we allow the update. If yes, we throw the appropriate error. There is a very nice and recent writeup on TechNet: SPN and UPN Uniqueness.

We have tried to catch all of the cases where an SPN or UPN is updated during normal operations, and succeeded pretty well. Examples: creating a new computer account, editing the attribute servicePrincipalName in some way, adding an SPN to a service account, etc. We did it so well that some of our customer had problems with a couple of corner cases where duplicate SPNs would be created (non-obviously) during normal operations, mostly in multi-domain forests:

  1. Domain Join: you try to add a new computer to the domain, and the new computer would have an SPN that already exists in another domain in the forest. For instance, contoso.com\comp1 exists, and you would try to add child.contoso.com\comp1. These accounts would both want to have the SPN: HOST/comp1, which would be duplicate.
  2. Intra forest migration, where you'd migrate an existing account from one domain to the other, while the old one gets left behind. Again, a duplicate with the SPN HOST/<computername> .
  3. You deleted a user. Then you realize your mistake, and create a new one with the same name and UPN. Your smarter colleague realizes that you have the AD recycle bin enabled, and simply restores the user. In this situation, the old and new users have different accounts (objects), but have the same UPN.

These cases are also well known by now, and described in a KB: Duplicate SPN check on Windows Server 2012 R2-based domain controller causes restore, domain join and migration failures. The same KB also describes how to disable the check for duplicates in case you absolutely have to. An expensive migration that is blocked on this comes to mind. But otherwise, please leave it alone. The check for duplication is goodness.

So, once your forest is on Windows Server 2012 R2 or higher, there should be no new duplicate SPN or UPN. Remaining ones are untouched during AD upgrade, but once cleaned up they won't come back. That is the advertised good news.

The bad news is that this is not the whole truth. The current checks rely on a query to the global catalog to check for duplicates before committing the modified SPN or UPN. If you stop and think about it you can see the weakness of this check, something I already mentioned in my previous post. What if the GC you are talking to does not have the latest data? What if you made a conflicting change on two DCs at the same time? That is the loophole. This can happen in two similar but not identical cases.

  1. A computer account is created on two different DCs are roughly the same time. Let's assume for clarity that these DCs are called DC1 and DC2 and that they live in different sites with a good bit of latency. The computer name is "conflict2", so they both have an SPN called: HOST/conflict2. The DCs check in their local GC (themselves) and find no duplicates because replication did not happen yet, so the creation is allowed. Replication happens, one of the computer accounts gets renamed (CNF mangling)... and that's all. No errors because of duplicate SPNs, because there were no local updates. The only thing that happened was that another object came in that just accidentally had an SPN that already existed locally. There is no check for that, so you end up with duplicate SPNs.
  2. Two different objects are updated with an SPN called HTTP/random on two different DCs at roughly the same time. Same DC names as above, object names might be "indirect1" and "indirect2". Same checks, same failure to detect the duplicate. Replication happens, and nothing else. The unrelated objects are now present on both DCs with a duplicate SPN.

In my lab, it looks like this. Checking with setspn:

spn-conflict-2016

You see the two cases: the identically named computer accounts where one gets mangled with a CNF name after replication, and the differently named computers that invisibly share the same SPN. For completeness sake, this is the more familiar view from ADUC:

conflicted-accounts-2016

I verified this behavior on a fully patched Windows Server 2012 R2 domain, and another domain on Windows Server 2016. I have also seen it real-life at a customer, which is the reason for this blog post in the first place.

Management summary: that monthly check for duplicate SPN and UPN should remain.