Why does it take too long to retrieve data from SQL server over a WAN connection?

Hi there,

In today’s blog post, I’m going to talk about how to troubleshoot a WAN performance problem by using simultaneous network traces.

One of customers reported that they were having performance problems when retrieving data from a SQL server running in a remote subnet. Interestingly the problem wasn’t in place when the client and SQL server were running in the same subnet J. Even though it was obvious from that testing that the WAN connectivity had something to do with the problem, our customer wanted to make sure that there were no issues at the client side and server side.

In such problem scenarios, the first thing we ask for is simultaneous network traces collected while reproducing the problem. Generally just looking at a one side network trace might yield incorrect results and may result in looking at the wrong direction. From that perspective, simultaneous network traces are a must.

Provided that simultaneous network traces are collected, how are you going to troubleshoot this problem? Well, there could be a number of approaches but I generally use the following method:

a) Find the same session (this will be most likely a TCP or UDP session) at both side traces. This may be a bit harder than expected especially if there’s NAT between the client and server side. Otherwise it should be an easy one.

b) After correctly identifying the same session at both sides, your next task will be to find the packet delays in the traces. The most appropriate way to do so is to order packets in the network trace based on time delta (time between two packets that are displayed back to back in a network trace).

c) After that, you can identify the highest time deltas within the trace (for example you can re-order packets based on time delta). If the problem is a WAN delay issue, this will generally look like that:

=> At the client side trace, most of the delays are seen in packets that are sent by the server side

=> At the server side trace, most of the delays are seen in packets that are sent by the client side.

In other words, from client perspective, it’s Server that is causing the delays and hence the performance problem. From server perspective, it’s Client that’s causing the delays and hence the performance problem. Actually both sides are WRONG! This stems from the WAN delay J. I think this may also underline the importance of collecting simultaneous network traces in such scenarios otherwise you will get the wrong results..

Well, now let’s take a look at an example to better understand this:

EXAMPLE:

========

Client is having performance problems in retrieving data from the SQL server and this is only visible when the client and server are running at different subnets. There’s also NAT (network address translation between the client and server). We have the IP address information and simultaneous network traces collected while reproducing the problem:

A. IP address information:

=> From client side perspective:

Client IP: 192.168.1.66

Server IP: 10.1.1.7

=> From server side perspective:

Client IP: 172.16.5.14

Server IP: 192.168.200.148

Note: IP address information is changed deliberately for data privacy

B. The session in which we see the performance problem:

Since this was a SQL server connectivity problem, it was obvious that the client should be contacting the SQL server at TCP port 1433 (if the default configuration wasn’t changed). After a few checks I found the following session in the simultaneous network traces:

Wireshark filter for Client side session:

(ip.addr eq 192.168.1.66 and ip.addr eq 10.1.1.7) and (tcp.port eq 46739 and tcp.port eq 1433)

Wireshark filter for the same session seen at Server side trace:

(ip.addr eq 172.16.5.14 and ip.addr eq 192.168.200.148) and (tcp.port eq 63584 and tcp.port eq 1433)

Now let’s look at how the communication is seen from the client perspective and server perspective:

C. From client perspective:

 

No. Time Source Destination Protocol Info

    406 20.541677 192.168.1.66 10.1.1.7 TCP 46739 > 1433 [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=11322755 TSER=0 WS=6

    407 0.089037 10.1.1.7 192.168.1.66 TCP 1433 > 46739 [SYN, ACK] Seq=0 Ack=1 Win=8192 Len=0 MSS=1380 WS=8 TSV=359429749 TSER=11322755

    408 0.000036 192.168.1.66 10.1.1.7 TCP 46739 > 1433 [ACK] Seq=1 Ack=1 Win=5888 Len=0 TSV=11322777 TSER=359429749

    409 0.000142 192.168.1.66 10.1.1.7 TDS TDS7/8 0x12 Packet

    410 0.089013 10.1.1.7 192.168.1.66 TDS Response Packet

    411 0.000058 192.168.1.66 10.1.1.7 TCP 46739 > 1433 [ACK] Seq=27 Ack=27 Win=5888 Len=0 TSV=11322799 TSER=359429758

    412 0.000861 192.168.1.66 10.1.1.7 TDS TDS7/8 0x12 Packet

    413 0.091710 10.1.1.7 192.168.1.66 TDS TDS7/8 0x12 Packet

    414 0.006916 192.168.1.66 10.1.1.7 TDS TDS7/8 0x12 Packet

    415 0.093413 10.1.1.7 192.168.1.66 TDS TDS7/8 0x12 Packet

    416 0.002583 192.168.1.66 10.1.1.7 TDS Unknown Packet Type: 23[Unreassembled Packet]

    425 0.091253 10.1.1.7 192.168.1.66 TDS Response Packet[Unreassembled Packet]

    426 0.000434 192.168.1.66 10.1.1.7 TDS Query Packet

    427 0.089305 10.1.1.7 192.168.1.66 TDS Response Packet

    428 0.000330 192.168.1.66 10.1.1.7 TDS Query Packet

    429 0.088216 10.1.1.7 192.168.1.66 TDS Response Packet[Unreassembled Packet]

    431 0.040908 192.168.1.66 10.1.1.7 TCP 46739 > 1433 [ACK] Seq=1040 Ack=1372 Win=9280 Len=0 TSV=11322926 TSER=359429804

    432 0.174717 192.168.1.66 10.1.1.7 TDS Remote Procedure Call Packet

   433 0.103898 10.1.1.7 192.168.1.66 TDS Response Packet[Unreassembled Packet]

    434 0.000056 192.168.1.66 10.1.1.7 TCP 46739 > 1433 [ACK] Seq=1947 Ack=1875 Win=10944 Len=0 TSV=11322995 TSER=359429835

    467 0.062827 192.168.1.66 10.1.1.7 TDS Remote Procedure Call Packet[Unreassembled Packet]

    473 0.091735 10.1.1.7 192.168.1.66 TDS Response Packet[Unreassembled Packet]

    474 0.000061 192.168.1.66 10.1.1.7 TCP 46739 > 1433 [ACK] Seq=1992 Ack=1901 Win=10944 Len=0 TSV=11323034 TSER=359429852

    475 0.008514 192.168.1.66 10.1.1.7 TDS Remote Procedure Call Packet

    476 0.093648 10.1.1.7 192.168.1.66 TDS Response Packet[Unreassembled Packet]

    477 0.001301 192.168.1.66 10.1.1.7 TDS Remote Procedure Call Packet[Unreassembled Packet]

    478 0.087454 10.1.1.7 192.168.1.66 TDS Response Packet[Unreassembled Packet]

    479 0.039780 192.168.1.66 10.1.1.7 TCP 46739 > 1433 [ACK] Seq=2944 Ack=2399 Win=12608 Len=0 TSV=11323092 TSER=359429871

...

From client’s perspective, responses from SQL server side come with a delay approximately 0.08 seconds or so. The client sends the next query without waiting...

D. From server perspective:

 

No. Time Source Destination Protocol Info

     44 21.123385 172.16.5.14 192.168.200.148 TCP 63584 > 1433 [SYN] Seq=0 Win=5840 Len=0 MSS=1380 TSV=11322755 TSER=0 WS=6

     45 0.000039 192.168.200.148 172.16.5.14TCP 1433 > 63584 [SYN, ACK] Seq=0 Ack=1 Win=8192 Len=0 MSS=1460 WS=8 TSV=359429749 TSER=11322755

     46 0.089015 172.16.5.14 192.168.200.148 TCP 63584 > 1433 [ACK] Seq=1 Ack=1 Win=5888 Len=0 TSV=11322777 TSER=359429749

     47 0.000461 172.16.5.14 192.168.200.148 TDS TDS7/8 0x12 Packet

     48 0.000117 192.168.200.148 172.16.5.14TDS Response Packet

     49 0.087810 172.16.5.14 192.168.200.148 TCP 63584 > 1433 [ACK] Seq=27 Ack=27 Win=5888 Len=0 TSV=11322799 TSER=359429758

     50 0.001553 172.16.5.14 192.168.200.148 TDS TDS7/8 0x12 Packet

     51 0.000427 192.168.200.148 172.16.5.14TDS TDS7/8 0x12 Packet

     52 0.099177 172.16.5.14 192.168.200.148 TDS TDS7/8 0x12 Packet

     53 0.004255 192.168.200.148 172.16.5.14TDS TDS7/8 0x12 Packet

     54 0.091772 172.16.5.14 192.168.200.148 TDS Unknown Packet Type: 23[Unreassembled Packet]

     55 0.001104 192.168.200.148 172.16.5.14TDS Response Packet[Unreassembled Packet]

     56 0.089833 172.16.5.14 192.168.200.148 TDS Query Packet

     57 0.000183 192.168.200.148 172.16.5.14TDS Response Packet

     58 0.089154 172.16.5.14 192.168.200.148 TDS Query Packet

     59 0.000150 192.168.200.148 172.16.5.14TDS Response Packet[Unreassembled Packet]

     60 0.128460 172.16.5.14 192.168.200.148 TCP 63584 > 1433 [ACK] Seq=1040 Ack=1372 Win=9280 Len=0 TSV=11322926 TSER=359429804

     61 0.178927 172.16.5.14 192.168.200.148 TDS Remote Procedure Call Packet

     62 0.001507 192.168.200.148 172.16.5.14TDS Response Packet[Unreassembled Packet]

     63 0.098217 172.16.5.14 192.168.200.148 TCP 63584 > 1433 [ACK] Seq=1947 Ack=1875 Win=10944 Len=0 TSV=11322995 TSER=359429835

     64 0.066525 172.16.5.14 192.168.200.148 TDS Remote Procedure Call Packet[Unreassembled Packet]

     65 0.000168 192.168.200.148 172.16.5.14TDS Response Packet[Unreassembled Packet]

     66 0.089069 172.16.5.14 192.168.200.148 TCP 63584 > 1433 [ACK] Seq=1992 Ack=1901 Win=10944 Len=0 TSV=11323034 TSER=359429852

     67 0.011578 172.16.5.14 192.168.200.148 TDS Remote Procedure Call Packet

     68 0.001386 192.168.200.148 172.16.5.14TDS Response Packet[Unreassembled Packet]

     70 0.089248 172.16.5.14 192.168.200.148 TDS Remote Procedure Call Packet[Unreassembled Packet]

     71 0.000147 192.168.200.148 172.16.5.14TDS Response Packet[Unreassembled Packet]

     72 0.129948 172.16.5.14 192.168.200.148 TCP 63584 > 1433 [ACK] Seq=2944 Ack=2399 Win=12608 Len=0 TSV=11323092 TSER=359429871

     73 0.002327 172.16.5.14 192.168.200.148 TDS Remote Procedure Call Packet

     74 0.001011 192.168.200.148 172.16.5.14TDS Response Packet

     75 0.088519 172.16.5.14 192.168.200.148 TCP 63584 > 1433 [ACK] Seq=3323 Ack=2497 Win=12608 Len=0 TSV=11323115 TSER=359429884

     76 0.003928 172.16.5.14 192.168.200.148 TDS Remote Procedure Call Packet[Unreassembled Packet]

     77 0.000144 192.168.200.148 172.16.5.14TDS Response Packet[Unreassembled Packet]

     78 0.097206 172.16.5.14 192.168.200.148 TDS Remote Procedure Call Packet

     79 0.001297 192.168.200.148 172.16.5.14TDS Response Packet[Unreassembled Packet]

     80 0.089896 172.16.5.14 192.168.200.148 TDS Remote Procedure Call Packet[Unreassembled Packet]

     81 0.000147 192.168.200.148 172.16.5.14TDS Response Packet[Unreassembled Packet]

     82 0.099239 172.16.5.14 192.168.200.148 TDS Remote Procedure Call Packet[Unreassembled Packet]

     83 0.002980 172.16.5.14 192.168.200.148 TDS Unknown Packet Type: 0 (Not last buffer)[Unreassembled Packet]

     84 0.000025 192.168.200.148 172.16.5.14TCP 1433 > 63584 [ACK] Seq=2869 Ack=6121 Win=65536 Len=0 TSV=359429923 TSER=11323187

     85 0.000272 192.168.200.148 172.16.5.14TDS Response Packet

     86 0.091009 172.16.5.14 192.168.200.148 TDS Remote Procedure Call Packet[Unreassembled Packet]

     87 0.000145 192.168.200.148 172.16.5.14TDS Response Packet[Unreassembled Packet]

     88 0.125878 172.16.5.14 192.168.200.148 TDS Remote Procedure Call Packet[Unreassembled Packet]

     89 0.002955 172.16.5.14 192.168.200.148 TDS Unknown Packet Type: 0 (Not last buffer)[Unreassembled Packet]

     90 0.000022 192.168.200.148 172.16.5.14TCP 1433 > 63584 [ACK] Seq=3783 Ack=8147 Win=65536 Len=0 TSV=359429945 TSER=11323242

     91 0.000233 192.168.200.148 172.16.5.14TDS Response Packet

     92 0.095498 172.16.5.14 192.168.200.148 TDS Remote Procedure Call Packet[Unreassembled Packet]

     93 0.000187 192.168.200.148 172.16.5.14TDS Response Packet[Unreassembled Packet]

     94 0.111501 172.16.5.14 192.168.200.148 TDS Remote Procedure Call Packet[Unreassembled Packet]

     95 0.005919 172.16.5.14 192.168.200.148 TDS Unknown Packet Type: 0 (Not last buffer)[Unreassembled Packet]

     96 0.000019 192.168.200.148 172.16.5.14TCP 1433 > 63584 [ACK] Seq=4760 Ack=10928 Win=65536

When checking the trace from server perspective, SQL server returns TDS responses almost always immediately. But the TDS requests from the client side come to SQL server with an average delay of 0.08 seconds or so.

RESULTS:

========

1) From client’s perspective, SQL server is slow in sending TDS responses

2) From SQL server’s perspective, client is slow in sending TDS requests.

3) The real problem is that TDS performance issue was stemming from WAN connectivity delay (like link delay/network device delays etc). In this instance the average WAN delay was about 80 milliseconds (0.08 seconds). Even though it looks like a small number at first look, considering that you may be exposed to that small delay let’s say 100 times while making a transaction, this could easily become an issue (like 8 seconds). So never underestimate such small values J

Hope this helps

Thanks,

Murat