I moved my PDCE role and accounts started locking out!

Hi, David here. We’ve seen a few cases on this now, so I wanted to put the word out and explain why it happens and how you can (very easily) prevent it from happening to you.

The scenario:

Imagine an ordinary domain admin. Let’s call him Fred. Fred has finally gotten the ok from his management to start deploying shiny new Windows Server 2008 R2 domain controllers (and the new hardware he wanted to do this). Fred brings up a DC and spends several weeks making sure that everything works. Then, confident in the stability of his new DC, he transfers the PDC Emulator role to it. Nothing explodes and everything appears to be good. Mission accomplished, Fred goes home and enjoys the rest of his weekend.

On Monday morning Fred gets to work and finds out that the help desk is swamped with calls from users whose accounts are locked out. Unlocking the accounts only seems to fix them temporarily, and then they get locked out again. Fred’s manager tells him to undo the change he made over the weekend, which he does. Desperate to figure out why his new DC betrayed him so horribly, Fred opens up a case with Microsoft for support and gets someone on our team.

Troubleshooting an account lockout:

Obviously this is a bad situation for Fred, but unfortunately it’s kind of hard to troubleshoot an account lockout without logs from while the problem was happening.

As an aside here, if you haven’t examined the Security Compliance Manager tool and its included docs, you should probably take a look. It lays out our recommendations around account lockout policies.

There are multiple tools for troubleshooting account lockouts, but sometimes it pays to go old-school: What we want for this are the Netlogon debug logs, which every domain administrator should be familiar with. Netlogon debug logging can show you all kinds of very useful information for troubleshooting authentication issues, particularly with NTLM authentication. In this situation, it shows us something very interesting if we take a look at the log from the domain controllers of the domain while the accounts are being locked out:

[LOGON] SamLogon: Transitive Network logon of Domain\User from Computer successfully handled on DC (UseHub is FALSE).

I should mention here that the netlogon debug logging is NOT turned on by default, which is really just a holdover from the days when your server processor speeds were measured in mhz. It can be a highly useful troubleshooting tool and everyone should know how to turn it on – documented here.

Here in DS, we spend a lot of time looking at Netlogon debug logs, and when we first saw the above line in the log, we were stumped as to where it was coming from. It’s not something that we normally see at all, and none of us could remember ever seeing it in a case before.

It turns out that this output only happens in the log under a very specific set of circumstances when the authenticating domain controller decides that it needs to bypass validating the password with the PDC if a bad password is received. When it makes this decision, it sets a parameter called UseHub to FALSE instead of the default of TRUE. Thankfully, it writes this in the netlogon log for us to see; otherwise we’d never have had any clue what it was doing.

Unfortunately the log didn’t tell us why it was happening - only that it did happen. But, after some snooping in source code and a few dozen emails, we discovered that this decision occurs when the PDC of the domain will not allow us to pass the client’s credentials because the client is using a Lan Manager Authentication Method that is not supported on the PDC.

Or, in normal language, what it means is that your LMCompatibility settings don’t match.

Why would this happen just by moving the PDC Emulator role?

So, like every new operating system, we ship Windows 2008 R2 with enhanced security when compared to its predecessors. Sometimes this security is accomplished by changing the way that OS works to make it harder to attack, or turning off unnecessary services until they are needed. At other times, we simply change default settings on features that were present in previous OS versions, because the majority of the world can now support those higher settings. LMCompatibility is one of those settings. The default for Windows 2008 R2 is a setting of 3: Send NTLM v2 response only/ Allow LM and NTLM.

In Fred’s case, it turned out that his XP clients all had a setting of 1: Send LM and NTLM responses, while his new PDC emulator had a setting of 5: Send NTLM v2 response only/Refuse LM and NTLM. It’s worth noting that these setting aren’t the default – someone had to choose to put them there. The clients couldn’t use NTLMv2 session security, which is why we couldn’t pass the user’s credentials to the 2008 R2 PDC Emulator for evaluation. The 2003 DCs on the other hand, had a setting of 2: Send LM and NTLM, Use NTLMv2 if negotiated. So when the PDC was running Windows 2003, we didn’t have this problem. So the new Win2008 R2 OS was not specifically an issue – the same issue would have happened to any version of Windows running the PDCE.

For normal Kerberos logons, we don’t care about LM Compatibility, but there are plenty of applications out there that will default to NTLM – and most applications will retry logons multiple times on your behalf without ever telling you that they’re doing it. In Fred’s environment, all it took was his Outlook clients, connecting to his Exchange CAS servers over http and using NTLM to try and authenticate that connection. The users had changed their passwords that morning and the local DCs didn’t have the new password – so, the password that Outlook used looked “bad” to the local DC. Because of the LM Compatibility mismatch, we couldn’t talk to the PDC, and thus we ended up locking the account out.

Solving the problem – the right way

So, Fred’s first inclination upon hearing from support about this might have been to reduce the security setting on the PDC emulator to make everything magically start working. And while this would have been effective, it would not have been the best solution from a security perspective. There are plenty of good reasons why you should want to use the strongest encryption and security algorithms on network communications, especially ones where your users passwords are being handed back and forth between computers for validation.

The right solution here is that Fred should be centrally managing his settings in a way that fits his network and enforces the best possible level of security. Fortunately there’s a group policy setting that enables him to do just that:

clip_image001

This setting is located in Computer Configuration ->Windows Settings -> Security Settings -> Local Policies -> Security Options. Notice the very helpful text on the Explain tab that outlines the default settings.

So, if Fred was confident that all of the computers on his network supported NTLMv2, he could go ahead and use the policy to enforce the highest level of security on his entire network (Send NTLMv2 response only\refuse LM and NTLM). Or, if he suspected that there might be a few applications (or more likely, ancient operating systems) out there that haven’t quite been retired yet that might have problems with NTLMv2, he could use the fourth option instead and just refuse LM connections. As a note here, every supported Windows OS version supports NTLM v2 – so the situations where you can’t use it should be very rare and only happen with specific, third-party applications or OS platforms.

David “Fred Herring” Beach