"The internet is down!" This is how I am greeted after a long day at work. My teenage son, who plays virtually every game known to mankind, could not join the majority of game servers for his favorite game. There were a handful that connected with no problems but naturally those were not the ones that his friends were on. As you can imagine, a travesty of justice of this magnitude does not sit well with a teenager who is being deprived of the "highly educational" banter of his friends as they defend the world from zombies or mindlessly run goats around a city. Don't get me wrong, I am thankful for their service and realize that we would be up to our eyeballs in zombie goats without them.
After checking all the usual suspects in troubleshooting (permissions issues, UAC, local and network firewall, proxy) I was drawing a blank. I was not surprised by that since most issues at those levels are binary, either blocking or not with no in-between. I finally decided to run a network capture to see if I could get any information that I was not seeing in any of the the computer's logs.
We ran a network capture of the session connection in both a good connection to an online server, and a failed connection attempt. Keep in mind that in most versions of Windows (7 & 2008 R2 & later), you can run a capture from the command line without installing any additional software by running:
netsh trace start capture=yes tracefile=c:\temp\YourTraceFile.etl"
To stop the trace after you have captured any failures or activity of interest, run the following command:
netsh trace stop
Gratuitous (Free) Product Plug
To view the network capture, I loaded the them into Microsoft Message Analyzer, which to this day, I am amazed at how few people know about and use this program. It is the next generation of analysis tool, replacing NetMon and is much more than a network protocol analyzer. I highly recommend checking it out and getting to know it's capabilities. It's free, it's awesome, download it!
By comparing the two captures side-by-side, I was able to find an ICMP packet that looked out of place and was only present in the bad capture that was the key to solving this problem. As with any network capture, firewall/proxy log analysis, or event log analysis, the real skill is in paring down the information to eliminate the "noise". Take some time and learn the filtering syntax. In this case, the filter that we want to show packets that would indicate a fragmentation issue would be:
icmp.type==3 and icmp.code==4
As shown in the image below, the summary field contained a warning "The datagram is too big". Also note that the packet length was only 576, which seemed excessively small. Normally, on a broadband connection or LAN, you would see something in the neighborhood of 1500 (1472 when you factor in the IP & ICMP headers of 20 & 8 bytes respectively).
MTU or Maximum Transmission Unit is a setting that basically tells your computer how much data to put in a packet to send over the network. Think about it like the difference between taxi cabs and buses in a crowded city. You can put a small number of people in a taxi (and have more of them running around) or you can have fewer buses that carry a larger number of passengers. The downside of a smaller MTU is that you have a certain amount data of consumed with packet headers and processing overhead dealing with more small packets. Larger MTU size results in fewer, large packets hitting the network so its more efficient. The downside is that it can tie up a slow link while transmitting the larger packets and, if you have communication issues that require the packets to be re-transmitted, a larger MTU size would be costly since the entire larger packet has to be re-sent. In most cases you don't have to mess with these settings, especially with the high bandwidth world we live in. You would barely notice the occasional re-transmissions or the added overhead of processing many smaller packets. Networks for the most part have a mechanism called Path MTU Discovery that will automatically examine the path that a packet will traverse and negotiate the largest MTU that the path will support.
The Diagnosis and Fix
In this particular case, there was a perfect storm situation where my ISP was pushing the MTU size at 576 in the DHCP negotiation and my firewall was preferring that setting over the specified MTU size for the connection. I ended up manually configuring the firewall to ignore the MTU setting from the DHCP negotiation.
A quick reset of the firewall to ensure that it went through a full re-negotiation of DHCP, and we were up and running with the larger MTU size. And guess what, my son's PC can now connect to the game servers! And he is not here to see this glorious moment!
My configuration at home may be more complicated than most since I have a firewall and proxy server in the mix. If you need to set the MTU size on the local computer that you are working on, you can use the commands below to determine your MTU size and set the MTU, once you know the maximum size.
=============================================== PROCESS TO DETERMINE MAXIMUM MTU IN WINDOWS: =============================================== netsh interface ipv4 show interface <---- THIS GETS THE INDEX # FOR THE INTERFACE Idx Met MTU State Name --- ---------- ---------- ------------ --------------------------- 1 50 4294967295 connected Loopback Pseudo-Interface 1 4 10 1500 connected Ethernet Now we're going to issue a ping command with a "-f" parameter to tell it not to allow fragmentation and a "-l xxxx" command to tell if how large of a packet to send. This allows us to find the largest packet that we can get through without fragmentation. ping server.contoso.com -l -f 1472 <---- THIS IS 1500 MINUS 28, THERE IS ALWAYS 28 BYTES OF "OVERHEAD" Packet needs to be fragmented but DF set. <---- MTU STILL TOO LARGE Packet needs to be fragmented but DF set. Packet needs to be fragmented but DF set. Packet needs to be fragmented but DF set. ping server.contoso.com -l -f 1400 <---- KEEP TRYING LOWER NUMBERS UNTIL SUCCESS Packet needs to be fragmented but DF set. <---- MTU STILL TOO LARGE Packet needs to be fragmented but DF set. Packet needs to be fragmented but DF set. Packet needs to be fragmented but DF set. ping server.contoso.com -l -f 1350 <---- KEEP TRYING LOWER NUMBERS UNTIL SUCCESS Pinging X.X.X.X with 1350 bytes of data: Reply from X.X.X.X: bytes=1350 time=1ms TTL=255 <---- SUCCESS! NOW WE NEED TO TRY A MTU IN BETWEEN THE LAST FAILED Reply from X.X.X.X: bytes=1350 time=1ms TTL=255 Reply from X.X.X.X: bytes=1350 time=1ms TTL=255 ping server.contoso.com -l -f 1372 <---- KEEP NARROWING IT DOWN UNTIL SUCCESSFUL PING, IN OUR CASE 1372 Pinging X.X.X.X with 1372 bytes of data: Reply from X.X.X.X: bytes=1372 time=1ms TTL=255 Reply from X.X.X.X: bytes=1372 time=1ms TTL=255 Reply from X.X.X.X: bytes=1372 time=1ms TTL=255 ping server.contoso.com -l -f 1373 <---- AS A FINAL TEST, INCREMENT THE SIZE BY 1 AND IT SHOULD FAIL Packet needs to be fragmented but DF set. Packet needs to be fragmented but DF set. Packet needs to be fragmented but DF set. Packet needs to be fragmented but DF set. Now that we know the exact size, we can set the MTU on the NIC using the index # that we grabbed earlier netsh inteface ipv4 set interface "4" mtu=1372 store=persistent
The Sappy Part
As an IT guy, I live to conquer these problems and I know my reward is the silence of the content. After doing this type of work for over 23 years, one bit of advice that I can offer to anyone getting started in this field: Don't expect the users to bust down your door to thank you for a functioning PC/server/domain/network. Your reward is the lack of a visit from the users. Learn to appreciate that silence.
By now, I am finally eating dinner and my son walks in and, gets on his PC an connects to his game, and announces to his friends that he can finally connect to the servers now. As I pass by, I can't accept the "silence of the content" this time, my son needs to appreciate the amount of time and effort that went into fixing his game.
Me: "Did you notice anything?"
My son: (peeling off his headset)" Thank you Dad!"
One last note: I had a different MTU issue over a year ago that was reported by a user who could not get to "some sites". Their web browser would connect to a site but would just quit at a blank browser screen after partially downloading site content. That incident turned out to be due to a piece of network equipment being configured not to allow fragmentation. The same filter in the above network capture was critical in pinpointing that issue.
The Rescued Animal Part
Watching this from the distance is our ferocious house cat Smokey, who doesn't care a bit about any MTU size issues. We are big animal fans in our household so I feel a need to point them out whenever I can. Smokey was rescued as a kitten from under the hood of a car where he had ridden over 15 miles. How he survived that trip, I have no idea. There are lots of animals at your local humane society like Smokey, go give them a good home.