Categorizing Packet Loss

I've quite frequently run into situations where I've been asked to diagnose packet loss based only on a network trace.  While it is almost impossible to find an exact answer, a network trace can provide some valuable clues about the cause of the packet loss.

The first step, if possible, is to get network traces from both endpoints in the conversation (and...if it's your lucky day...from any intermediate points where a trace can be run).  Once you have these, find a retransmitted packet in one of the traces, then filter all the traces to just that packet.  (In Ethereal, try a filter of tcp.seq== on all of your traces.)  Compare at each point to see whether particular packets are present or absent -- based on this, you can start to build some idea of where the packet might be lost. 

The next step is to examine the frequency of the packet loss.  In my experience, there are two frequencies that you will see most often -- a fairly random distribution of retransmissions through the entire conversation, or retransmissions only on certain packets (usually leading up to the end of the connection as the packet is retransmitted up to tcpmaxretransmits).

In the first case, the most common cause is a network segment or device that is either over-capacity or losing packets for another reason (noisy dial-up link, for example).  Finding the correct link and/or device becomes a matter of either taking traces at each possible hop or, alternately, employing a process of elimination by using different endpoints to test each hop.  Depending on the situation, the pathping command may also show at which hop the loss begins.

In the second case, you will usually see a session that runs fine until a certain packet is transmitted.  At this point, that packet is never received by the other endpoint and, after a number of retransmissions, the connection errors out.  The things to look for here are:

  1. Size of the packet.  If all of the preceeding packets in the connection have been smaller and the first full packet is dropped, then you are probably looking at an issue with a black-hole router
  2. Data in the packet.  In a number of cases, I've seen heuristic scanners (usually anti-virus software implemented as filter drivers) that believe they recognize an objectionable pattern in a packet and, therefore, silently discard it.  Try removing anti-virus software, as well as any other 3rd party filter drivers.

This posting is provided "AS IS" with no warranties, and confers no rights.