Hi Rob here. I thought I would show you how we in Microsoft Commercial Technical Support typically troubleshoot Kerberos authentication issues. This discussion should do much to get you more comfortable viewing network traces for Kerberos authentication problems. There are other ways to troubleshoot Kerberos; one could use the Kerberos event logging outlined in KB 262177. Although you could rely on this method, it will take longer to resolve the issue and involves making some educated guesses without the network trace.
I am going to layout my lab configuration in case you want to reproduce the problem and look at the network traces on your own.
The root domain litwareinc.com has one domain controller in the domain, and one member server.
Domain Controller network configuration:
Host Name: LTWRE-RT-DC1
Member Server network configuration:
Host Name: LTWRE-RT-MEM1
The child domain litware-chld.litwareinc.com has one domain controller in the domain, and one member server.
Domain Controller network configuration:
Host Name: LTWRE-CHD-DC1
Member Server network configuration:
Host Name: LTWRE-CHD-MEM1
NOTE: I’m stating the obvious here, I know, but this configuration is for testing only. Having only one DC per domain usually means you’ll be rebuilding the forest at some point.
Network based troubleshooting (network captures) is the fastest way to determine the problem, and by learning a few short filters you can effectively troubleshoot most Kerberos-related problems.
You can use any network capture utility that you feel comfortable with. I prefer Netmon, nmcap (part of Netmon 3.x) or netcap (XP and 2003 support tools) to collect the network trace, and I use Wireshark to view the network capture. This is in no way an endorsement of Wireshark – feel free to use Ethereal, Packetyzer, etc.
There is a service running on LTWRE-RT-MEM1 server that runs starts /runs as “LocalSystem” account. This service connects to a file share on LTWRE-CHD-MEM1 named “AppShare” to access some files. The Service is failing to retrieve the files and is giving you an error of “Access is denied”. When you attempt to access the share as a domain user account on LTWRE-RT-MEM1 you are able to access the share.
Auditing for Logon/Logoff was enabled on LTWRE-CHD-MEM1, so you start by examining the security event log.
When the LITWAREINC\Administrator attempts to access the share we get the following Audit Event:
Notice how the user that authenticated to the server is the “LITWAREINC\Administrator” account. It used NTLM authentication and the source machine name is LTWRE-RT-MEM1.
When the Service attempts to access the share we get the following Audit Event:
Notice that when the service attempts to authenticate to the server it is doing it anonymously.
Hey, why is the computer authenticating to the other machine using NTLM authentication?
I thought we were in the 21st century with Kerberos authentication?
As it turns out, starting with Windows XP and Windows Server 2003 a computer cannot not use NTLM authentication when accessing a remote resource. If it does, it will use Anonymous Logon credentials and typically fail.
That means we have to figure out why Kerberos authentication is failing on LTWRE-RT-MEM1 when accessing a share on LTWRE-CHD-MEM1.
Typically when you troubleshoot using network captures, you want to install the network capture utility on both ends of the communications to make sure that there are no network devices (firewalls, routers, switches, VPN appliances, etc.) that are manipulating the packet in between the two systems. We call this taking a double-sided trace.
When working with a customer, we will typically request a double-sided network capture be taken. In this scenario I would start with installing the network capture utility on the source and destination server to see what is going on.
So the next question I guess becomes what are the steps to taking a good network capture?
Well, we want to see all name resolution, and we will also want to ensure that we see the Kerberos tickets (Authentication) in the capture. We also want to make sure that we can reproduce this problem at will to see this problem for ourselves.
So, how can we reproduce the problem?
1. Get a command prompt as the “SYSTEM” and attempt to access the remote system.
On Windows 2000, Windows XP, and Windows Server 2003 we can use the AT command to get a command prompt as the “SYSTEM” account by type the following command:
AT <Military Time in Future> /Interactive “cmd.exe”
i.e. if the time is currently 7:04 PM you would type in: AT 19:06 /Interactive “cmd.exe”
Then at 7:06 PM you should see a command prompt pop up
NOTE: You have to do this while logged into the console session. If you are RDP’ed in you need to start the RDP session with the /console switch otherwise you will never see the command window start.
2. Start the network capture utility.
3. Clear all name resolution cache as well as all cached Kerberos tickets.
- To clear DNS name cache you type in: IPConfig /FlushDNS
- To clear NetBIOS name cache you type in: NBTStat –R
- To clear Kerberos tickets will need KList.exe: KList purge
The above commands need to be done in the command prompt that came up for “SYSTEM”
4. Now you need to run a command that will require authentication to the target server. Either of the following will do:
5. Once you get the error message, stop and save the network captures.
Reviewing the network capture:
If you are using Wireshark to view the trace, the Filter is simple: “dns || Kerberos || ip.addr==<IP Address of Target machine>”. Basically, this filter means “Show me all packets sent to or from the target machine, all DNS name queries and responses, and all Kerberos authentication.”
It should look similar to this:
Once you have the network capture, you should see all DNS, Kerberos Authentication (As well as Packets that have Kerberos tickets in them), and anything destined for the remote system.
Before we go over the capture too much, we should probably cover at a high level the steps taken to connect to a remote file share.
1. Resolve the host name for the target system to an IP address.
a. Look in the HOSTS file.
b. Query DNS.
c. Look in the LMHOSTS file.
d. Query WINS / NBNS.
2. Ping the remote system.
3. Negotiate an Authentication protocol. Kerberos is preferred for Windows hosts.
4. Request a Kerberos Ticket.
5. Perform an SMB “Session Setup and AndX request” request and send authentication data (Kerberos ticket or NTLM response).
Let’s look at those steps in more detail.
Step 1 – resolve the name:
Remember, we did “IPConfig /FlushDNS” so that we can see name resolution on the wire. Frame 1 is the query out. Hmm, this looks kind of funny: querying for LTWRE-CHD-MEM1.litwareinc.com. Well, that part should be fine, I suppose, since the DNS server should not find the record. But wait Frame 6 shows that the DNS Server responded to the query with 10.10.200.21, and sure enough that is the correct IP Address for the target server.
Step 2 – ping the remote system:
Yep, the remote system is ping able. See the Echo request and reply. So the system is up and available.
Step 3 – Negotiate Authentication:
So now we negotiate the authentication protocol and the remote system responded; the response is the more important part of the packet. We see that it supports MS KRB5, KRB5, and NTLMSSP; it even gave us the principal name of the system.
Step 4 – Request a Kerberos ticket:
Alright, now to the meat of Kerberos authentication and viewing it in a network trace. If you remember, we used KList Purge command to clear out all tickets on the system. That means that the server has to get a Ticket Granting Ticket (TGT) first, and this is why you are seeing the AS-REQ and AS-REP frames. If Kerberos ticketing is new to you, I would suggest reviewing the blog on how Kerberos works.
Next, we see the TGS-REQ in Frame 18; let’s take a closer look at this packet in the details pane.
You can see that the system is handing its TGT to the Kerberos Key Distribution Center (KDC) under “padata: PA-TGS-REQ” section, and requesting a ticket for server “cifs/LTWRE-CHD-MEM1.litwareinc.com” in the LITWAREINC.COM realm (Windows Domain) under “KDC_REQ_BODY” section.
OK, since we now know that we are requesting a Kerberos ticket for “cifs/LTWRE-CHD-MEM1.litwareinc.com” in the litwareinc.com domain. This will not work since the remote system actually lives in the “litwareinc-chld.litwareinc.com” domain. So you see why the KDC responded back with KRB5KDC_ERR_S_PRINCIPAL_UNKNOWN. Again, if you do not understand this please review the blog on how Kerberos works.
Step 5 – Perform a SMB “Session Setup AndX request”:
So we see in the following Frames:
- Frame 20 shows that, since Kerberos failed due to an unknown service principal name, the NTLMSSP_NEGOTIATE authentication package is selected. Frame 21 shows that the remote system sending the NTLMSSP_CHALLENGE (this is typical) back.
- Frame 22 shows that the system sent no NTLM credentials to the remote system. It is authenticating as NT AUTHORITY\Anonymous.
- Frame 23 shows that the remote system allowed the session to be created.
- Frame 24 & 25 shows that we do a Tree connect to the IPC$ share and get a response.
- Frame 26 & 27 shows that we connect the SRVSVC named pipe and get STATUS_ACCESS_DENIED back.
So where do you think things start to go wrong here in the trace?
If you answered DNS name resolution you would be correct. If name resolution is not working properly in the environment it will cause the application requesting a Kerberos ticket to actually request a Service ticket for the wrong service principal name. So if you remember the remote file server I am attempting to connect to “ltwre-chd-mem1.chd.litwareinc.com”, however the DNS Server found a record for “ltwre-chd-mem1.litware.com”. Since we found the remote file server in the “litwareinc.com” domain the Kerberos client requests a service ticket for “cifs/ltwre-chd-mem1.litwareinc.com” as noted in the Kerberos ticket request, and the KDC responds with KRB5KDC_ERR_S_PRINCIPAL_UNKNOWN.
I did another net view specifying the FQDN of LTWRE-CHD-MEM1 and WOW, look at the output:
That actually worked! So, how can we fix this problem?
Actually, there are several different ways to “fix” the problem:
a. Find out why DNS is resolving the machine name incorrectly.
i. Is there a HOST or CNAME record for this name?
ii. Did you configure the DNS Zone for WINS lookup?
b. Configure your application to use the FQDN of the system instead of NetBIOS name.
c. We could add an Service Principal Name to LTWRE-CHD-MEM1 for “CIFS/LTWRE-CHD-MEM1.litwareinc.com”
The best way to “Fix” the problem is to actually fix DNS name resolution. By the way, the lab was configured with “WINS Lookup” enabled on the litwareinc.com DNS Zone. If you are failing to use Kerberos authentication using the LocalSystem account, you are more than likely failing to use Kerberos authentication when users are going to the remote system. However, they are not getting “Access is denied” because user accounts, unlike machine accounts, can fail over to NTLM and authenticate with credentials rather than as Anonymous.
If you find that fixing the DNS problem is not possible, then the next best solution would be to make the application use the FQDN of the server. Keep in mind that the application vendor would need to be involved to use this fix.
The least favorite method to resolve the issue would be to add the SPN to the destination server using the SetSPN.exe tool. This is the least favorite because you are adding another name to the machine account in another domain. What would happen if in the future you bring up a new computer in the root domain with the same name? Now you have a duplicate SPN and this will lead to other Kerberos authentication problems.
Well, I hope that you have learned a few new things like:
- How name resolution problems could cause Kerberos authentication to fail.
- How to easily filter network traces to confidently determine where Kerberos authentication is failing.
- How the SMB protocol and authentication look in a network trace.
Please keep in mind that there are several other ways that name resolution could cause Kerberos authentication to fail. You could have static WINS entries in the database, or you could have wrong entries in HOSTS / LMHOSTS files. You could be failing because of a CNAME / “A” (HOST) record within your DNS zone, or simply because of the DNS Zone is configured for “WINS Lookup”.