In general, if you have an AD-related issue the following logs are useful:
- Event logs from the affected machine(s)
- Component-specific debug logs from the affected machine(s) (Netlogon logs, Userenv logs, IIS logs, etc.)
- Network traces taken while the problem is happening
- Procmon traces that show file activity on the affected machine(s) covering the same time as the Network trace
In this blog entry I want to focus on #3; how to gather and analyze a useful network trace.
A trace by itself can be useful – for a trace to be REALLY useful however, you need to make sure you’re:
- capturing network traffic on the NIC where the problem is occurring and WHEN it is occurring (this is harder than it sounds)
- have allocated a sufficiently large capture buffer so the frames containing the trace don’t get overwritten
- are tracing simultaneously from both endpoints where the problem occurs
- noting down what you’re doing in the trace and which error messages you’re seeing.
A solitary trace without any description of what’s happening in it is like a box of chocolates – “you never know what you’re gonna get”
A trace taken from both ends of the conversation AND where you also have or collect event logs and the component-specific logs for the problem you’re troubleshooting are worth its weight in gold however (how much does a megabyte weigh anyway?).
At any rate….once you have a usable trace – you can start filtering and drilling down to the specifics like specific protocols or ports.
The most useful filters to put in from the AD perspective are:
- dcerpc (RPC traffic)
- kerberos (Kerberos ticket requests and other traffic containing Kerberos information)
- ldap or cldap (LDAP searches and writes over TCP or UDP)
- dns (not much goes on in AD without a preceeding DNS query – make sure you flush the DNS cache of the client before starting though)
- smb (Group Policy being applied from Sysvol on a DC for example)
Other things to look for in the network traces are:
- Retransmissions of packets (if you have a trace from both sides you should see whether the packet is reaching the other side or is being eaten by the firewall in-between)
- Packets leaving one end but never arriving at the other end
- Excessive Resets of TCP connections
- Excessive traffic coming from specific clients
At this point, you really need to have a good idea of what the component you’re troubleshooting is doing. With that in place you effectively have a triangulating device to zoom in on the problem, i.e. “What’s happening on the wire” (the network traces)+ “What’s happening on the machine” (the component logs/event logs/procmon logs)+”What should be happening” (your knowledge of how the component should behave).
With that in place – the majority of issues should be solvable with time, patience and good old troubleshooting intuition (“troubleshooting with your fingertips”).
Network Monitor Team blog:
Intro to filtering with Network Monitor 3.0
Capturing network traffic in Windows 7 with NetSH
Wireshark Network Protocol Analyzer
Troubleshooting IEEE 802.11 Wireless Access with Microsoft Windows
Troubleshooting the “RPC server is unavailable” error