Troubleshooting AD with Network Monitoring tools

In general, if you have an AD-related issue the following logs are useful:

  1. Event logs from the affected machine(s)
  2. Component-specific debug logs from the affected machine(s) (Netlogon logs, Userenv logs, IIS logs, etc.)
  3. Network traces taken while the problem is happening
  4. Procmon traces that show file activity on the affected machine(s) covering the same time as the Network trace

In this blog entry I want to focus on #3; how to gather and analyze a useful network trace.

A trace by itself can be useful - for a trace to be REALLY useful however, you need to make sure you're:

  • capturing network traffic on the NIC where the problem is occurring and WHEN it is occurring (this is harder than it sounds)
  • have allocated a sufficiently large capture buffer so the frames containing the trace don't get overwritten
  • are tracing simultaneously from both endpoints where the problem occurs
  • noting down what you're doing in the trace and which error messages you're seeing.

A solitary trace without any description of what's happening in it is like a box of chocolates - "you never know what you're gonna get” :-)

A trace taken from both ends of the conversation AND where you also have or collect event logs and the component-specific logs for the problem you're troubleshooting are worth its weight in gold however (how much does a megabyte weigh anyway?).

At any rate....once you have a usable trace - you can start filtering and drilling down to the specifics like specific protocols or ports.

The most useful filters to put in from the AD perspective are:

  • dcerpc (RPC traffic)
  • kerberos (Kerberos ticket requests and other traffic containing Kerberos information)
  • ldap or cldap (LDAP searches and writes over TCP or UDP)
  • dns (not much goes on in AD without a preceeding DNS query - make sure you flush the DNS cache of the client before starting though)
  • smb (Group Policy being applied from Sysvol on a DC for example)

Other things to look for in the network traces are:

  • Retransmissions of packets (if you have a trace from both sides you should see whether the packet is reaching the other side or is being eaten by the firewall in-between)
  • Packets leaving one end but never arriving at the other end
  • Excessive Resets of TCP connections
  • Excessive traffic coming from specific clients

At this point, you really need to have a good idea of what the component you're troubleshooting is doing. With that in place you effectively have a triangulating device to zoom in on the problem, i.e. “What's happening on the wire” (the network traces)+ “What's happening on the machine” (the component logs/event logs/procmon logs)+ ”What should be happening” (your knowledge of how the component should behave).

With that in place – the majority of issues should be solvable with time, patience and good old troubleshooting intuition (“troubleshooting with your fingertips”).

Network Monitor Team blog:
http://blogs.technet.com/netmon/

Intro to filtering with Network Monitor 3.0
http://blogs.technet.com/netmon/archive/2006/10/17/into-to-filtering-with-network-monitor-3-0.aspx

Capturing network traffic in Windows 7 with NetSH
http://blogs.technet.com/mrsnrub/archive/2009/09/10/capturing-network-traffic-in-windows-7-server-2008-r2.aspx

Wireshark Network Protocol Analyzer
http://www.wireshark.org/

Troubleshooting Replication
http://technet.microsoft.com/en-us/library/cc755349(WS.10).aspx

Troubleshooting IEEE 802.11 Wireless Access with Microsoft Windows
http://technet.microsoft.com/en-us/library/bb457017.aspx

Troubleshooting the “RPC server is unavailable” error
http://blogs.technet.com/abizerh/archive/2009/06/11/troubleshooting-rpc-server-is-unavailable-error-reported-in-failing-ad-replication-scenario.aspx