Side Effects of Incorrect DNS configuration on ISA Server: 10060 Connection Timeout Scenario

1. Introduction

 

Although this topic might sounds “basic stuff”, we still see many cases where DNS is the culprit. There are cases where DNS is not answering properly, not correctly configured on ISA Server and in some cases the DNS Server is actually down (easy one to fix it J). The fact that matters is that if your DNS gets sick, ISA will be ill too. ISA Server rely as a core component on name resolution, although ISA Server has its own DNS cache that runs under the firewall service process context (wsprsrv.exe), ISA is still using the OS DNS client as well.

 

The goal of this article is to go through a scenario where clients were complaining that they were unable to access some web sites when passing through ISA Server. However, before we start the scenario that explains one common side effect of having an incorrect DNS configuration on ISA Server take a look in the following article as the main recommendation for DNS Configuration on ISA Server:

http://technet.microsoft.com/en-us/library/cc302590.aspx

 

2. Sample Scenario

 

To make it easier to understand the root cause of this problem, let’s take a look in the topology used by this environment:

 

Figure 1 – Topology used on this scenario.

 

In this scenario the error message that the client workstation was receiving was 10060 (timeout) when trying to browse www.fourthcoffee.com and the error message below was showing up on ISA Server Logging as:

 

 

Figure 2 – Error while browsing www.fourthcoffee.com.

 

I bet that if you look closely on figure 1 and 2 you already identify the possible cause of this issue, didn’t you? Probably you did and don’t realize yet. Notice that the client workstation is part of the fourthcoffee.com domain, it is trying to browse an intranet site called www.fourthcoffee.com and ISA Server instead of sending the packet to the internal network is actually thinking that the destination belongs to the external network. If you look it to Figure to you will see that the destination on ISA perspective is 192.168.0.189, which is an external host (based on the topology showed on figure 1). Why this happens in first place?

 

For this particular scenario, when looking to the netmon trace of the internal network I couldn’t see any DNS Query to resolve www.fourthcoffee.com, there are three assumptions at that point: ISA Server is using cache information, ISA Server is sending the DNS Query to another interface or client already resolved the name and is sending directly to the destination IP. Looking the netmon trace on the external interface it was possible to see the following request

 

192.168.0.1 192.168.0.10 DNS DNS:QueryId = 0xF5C5, QUERY (Standard query), Query for www.fourthcoffee.com of type Host Addr on class Internet

 

192.168.0.10 192.168.0.1 DNS DNS:QueryId = 0xF5C5, QUERY (Standard query), Response - Success, 192.168.0.189

 

After ISA Server resolves the name it tries to establish the TCP Handshake on port 80 on the destination host and guess what? This host doesn’t answer as shown below:

 

192.168.0.1 192.168.0.189 TCP TCP:Flags=......S., SrcPort=9496, DstPort=HTTP(80), PayloadLen=0, Seq=3490482351, Ack=0, Win=65535 ( ) = 65535

192.168.0.1 192.168.0.189 TCP TCP:[SynReTransmit #5]Flags=......S., SrcPort=9496, DstPort=HTTP(80), PayloadLen=0, Seq=3490482351, Ack=0, Win=65535 ( ) = 65535

192.168.0.1 192.168.0.189 TCP TCP:[SynReTransmit #5]Flags=......S., SrcPort=9496, DstPort=HTTP(80), PayloadLen=0, Seq=3490482351, Ack=0, Win=65535 ( ) = 65535

192.168.0.1 192.168.0.189 TCP TCP:[SynReTransmit #8]Flags=......S., SrcPort=9498, DstPort=HTTP(80), PayloadLen=0, Seq=1596124610, Ack=0, Win=65535 ( ) = 65535

192.168.0.1 192.168.0.189 TCP TCP:[SynReTransmit #12]Flags=......S., SrcPort=9499, DstPort=HTTP(80), PayloadLen=0, Seq=3838730273, Ack=0, Win=65535 ( ) = 65535

192.168.0.1 192.168.0.189 TCP TCP:[SynReTransmit #12]Flags=......S., SrcPort=9499, DstPort=HTTP(80), PayloadLen=0, Seq=3838730273, Ack=0, Win=65535 ( ) = 65535

 

Lots of retransmissions without answer and after that ISA Server sends the HTTP 504 Time out to the client workstation.

 

3. Root Cause

 

The problem here was that ISA Server was using DNS on both adapters, internal and external. According to the ISA Server DNS Best Practices this shouldn’t be done because can cause behaviors like that. Here again the statement from the article that was mentioned in the beginning of this post:

 

Always point DNS to either internal servers or external servers, never to both.

From http://technet.microsoft.com/en-us/library/cc302590.aspx

Clearly using DNS on both adapters can cause you many problems, here is another example of a problem that can happen (from the same TechNet article):

 

Q: Why not point the external ISA NIC to the ISP for DNS?

A:   The problem here is that ISA doesn’t know what is internal or external when trying to resolve names. This means ISA can end up trying to resolve internal names to the external ISP. Once it receives “name not found”, the ISA Server computer won’t look for the internal name again and you will fail to participate in the domain.

 

4. Under the Hood

 

If you understood the above scenario and is aware that you shouldn’t do that then the goal of this article was achieved. However if you want to go under the hood and see how wspsrv.exe behaves in such circumstance, take a look in this kernel mode dump while this problem was happening:

 

- First thing to do is to type !process 0 0 to identify the process if of wspsrv.exe. Since the output for this command is long in a kernel mode dump (due the amount of process running) I will go to the next command which is to change the context to wspsrv.exe process:

 

0: kd> .process /p / r 89468b70

Implicit process is now 89468b70

- At this point we can dump all the threads running within this process by using the command !process 89868b70. The output for this can be also huge, depending how busy wsprsv.exe is. The output of this command revealed also that there are some threads were waiting for an LPC call to complete, such as this one below:

 

        THREAD 88903020 Cid 071c.07bc Teb: 7ffa5000 Win32Thread: e2059078 WAIT: (WrLpcReply) UserMode Non-Alertable

            8890320c Semaphore Limit 0x1

        Waiting for reply to LPC MessageId 00083100:

        Current LPC port e203d118

        IRP List:

            87e12aa8: (0006,0190) Flags: 00000000 Mdl: 87dd05c0

            88953008: (0006,0190) Flags: 00000000 Mdl: 88196218

            87a4ade0: (0006,0190) Flags: 00000000 Mdl: 878a4660

         87827008: (0006,0190) Flags: 00000000 Mdl: 8819a590

            879f74f8: (0006,0190) Flags: 00000000 Mdl: 881ec498

            8772ba40: (0006,0190) Flags: 00000000 Mdl: 8819bf88

            881b6b88: (0006,0190) Flags: 00000000 Mdl: 87e5c7c8

     880a2cf8: (0006,0190) Flags: 00000000 Mdl: 88282b88

            87a5c008: (0006,0190) Flags: 00000000 Mdl: 8933ad00

            87e34938: (0006,0190) Flags: 00000000 Mdl: 881f11c8

            8890b318: (0006,0094) Flags: 00000000 Mdl: 00000000

  885d5008: (0006,0190) Flags: 00000000 Mdl: 89969740

            885e8290: (0006,0190) Flags: 00000000 Mdl: 883aa4a0

            89465bf0: (0006,0094) Flags: 00000800 Mdl: 00000000

            8931b730: (0006,0190) Flags: 00000000 Mdl: 89e39e48

            89505008: (0006,0190) Flags: 00000000 Mdl: 898b1108

        Not impersonating

        DeviceMap e1630f98

        Owning Process 89468b70 Image: wspsrv.exe

        Wait Start TickCount 70669 Ticks: 56 (0:00:00:00.875)

        Context Switch Count 22391 LargeStack

        UserTime 00:00:05.218

        KernelTime 00:00:01.578

        Win32 Start Address wspsrv!CtrlThrdRtn (0x00456baa)

   Start Address kernel32!BaseThreadStartThunk (0x77e617ec)

        Stack Init f7557000 Current f7556c08 Base f7557000 Limit f7553000 Call 0

        Priority 13 BasePriority 8 PriorityDecrement 0

        ChildEBP RetAddr

        f7556c20 80833485 nt!KiSwapContext+0x26
f7556c4c 80829a82 nt!KiSwapThread+0x2e5

        f7556c94 8091ec18 nt!KeWaitForSingleObject+0x346

        f7556d50 808897bc nt!NtRequestWaitReplyPort+0x776 <== Waiting for a reply (RPC over LPC)

        f7556d50 7c8285ec nt!KiFastCallEntry+0xfc

        01c3eb78 7c82787b ntdll!KiFastSystemCallRet

        01c3eb7c 77c80a6e ntdll!NtRequestWaitReplyPort+0xc

        01c3ebc8 77c7fcf0 RPCRT4!LRPC_CCALL::SendReceive+0x230

        01c3ebd4 77c80673 RPCRT4!I_RpcSendReceive+0x24 <= RPC Call

        01c3ebe8 77ce315a RPCRT4!NdrSendReceive+0x2b

        01c3efd0 76ed5049 RPCRT4!NdrClientCall2+0x22e

        01c3efe8 76ed4f69 DNSAPI!R_ResolverQuery+0x1c

        01c3f044 76ed6f5d DNSAPI!Query_PrivateExW+0x187

        01c3f078 76ee9d0c DNSAPI!Query_Shim+0x48

        01c3f0a0 62ea7442 DNSAPI!DnsQuery_A+0x20 <= DNS Query

        01c3f0ec 62ea789e MSPHLPR!QueryDnsCache+0x3a

        01c3f110 62ea6583 MSPHLPR!gethosttlbyname+0x6e

        01c3f458 647484a2 MSPHLPR!ProxyGetHostByName+0x32c

        01c3f498 64748e4e w3filter!+0x2cc

        01c3f4e0 6476676f w3filter!+0x306

        01c3f530 64773614 w3filter!+0x19a

        01c3f56c 647746d4 w3filter!+0x112

        01c3f580 64729101 w3filter!+0x18

        01c3f58c 64727a4e w3filter!+0x3b

        01c3fe1c 6472829a w3filter!+0xc23

       01c3fe90 0046d838 w3filter!+0x3dd

        01c3fefc 0046e598 wspsrv!+0x15f

        01c3ff14 004c5c7e wspsrv!+0x8a

        01c3ff20 0046e69f wspsrv!+0x37

        01c3ff30 0046d5f1 wspsrv!+0x31

        01c3ff50 00455fed wspsrv!+0x23

        01c3ff7c 00456d12 wspsrv!+0x44

        01c3ffb8 77e64829 wspsrv!+0x168

        01c3ffec 00000000 kernel32!BaseThreadStart+0x34

 

- According to this we are lead to believe that wspsrv.exe is waiting for a LPC call to complete in order to continue.

- Some additional notes about this output:

 

· WAIT: (WrLpcReply) : This tells us that the thread is waiting for a LPC reply. This information called as the “Wait Reason” is passed to KeWaitForSingleObject( ) as a parameter. This is retrieved from the _KTHREAD.WaitReason field.

· Waiting for reply to LPC MessageId 00083100: This line gives the message ID of the LPC message that was sent to the server thread.

· Current LPC port e203d118: The port number listed is the client communication port.

· 8890320c Semaphore Limit 0x1 : The semaphore that the thread is waiting for is signaled when a reply message is posted to the client communication ports message queue. NtRequestWaitReplyPort( ) waits on this semaphore.

 

Note: Reference for those definitions - http://blogs.msdn.com/ntdebugging/archive/2007/09/10/lpc-part-3-debugging-a-real-world-lpc-scenario.aspx

 

- Let’s switch the context to the thread 88903020 now by using the command .thread 88903020 and after that we can dump the stack with the command kv 25 (in this case 25 will give us the complete call stack since the BaseThreadStart function).

 

0: kd> .thread 88903020

Implicit thread is now 88903020

0: kd> kv 25

ChildEBP RetAddr Args to Child

f7556c20 80833485 88903020 889030c8 00000001 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])

f7556c4c 80829a82 8890320c 889031e0 88903020 nt!KiSwapThread+0x2e5 (FPO: [Non-Fpo])

f7556c94 8091ec18 8890320c 00000011 00000001 nt!KeWaitForSingleObject+0x346

f7556d50 808897bc 00000ca8 18611128 18611128 nt!NtRequestWaitReplyPort+0x776

f7556d50 7c8285ec 00000ca8 18611128 18611128 nt!KiFastCallEntry+0xfc

01c3eb78 7c82787b 77c80a6e 00000ca8 18611128 ntdll!KiFastSystemCallRet

01c3eb7c 77c80a6e 00000ca8 18611128 18611128 ntdll!NtRequestWaitReplyPort+0xc

01c3ebc8 77c7fcf0 01c3ec04 01c3ebe8 77c80673 RPCRT4!LRPC_CCALL::SendReceive+0x230

01c3ebd4 77c80673 01c3ec04 76ed42d8 01c3eff0 RPCRT4!I_RpcSendReceive+0x24

01c3ebe8 77ce315a 01c3ec30 186111a4 00000000 RPCRT4!NdrSendReceive+0x2b

01c3efd0 76ed5049 76ed42d8 76ed421c 01c3eff0 RPCRT4!NdrClientCall2+0x22e

01c3efe8 76ed4f69 00000000 1ed800e8 00000001 DNSAPI!R_ResolverQuery+0x1c

01c3f044 76ed6f5d 1ed800e8 00000001 00000000 DNSAPI!Query_PrivateExW+0x187

01c3f078 76ee9d0c 00000003 01c3f354 00000001 DNSAPI!Query_Shim+0x48

01c3f0a0 62ea7442 01c3f354 00000001 00000000 DNSAPI!DnsQuery_A+0x20

01c3f0ec 62ea789e 01c3f354 00000001 01c3f130 MSPHLPR!QueryDnsCache+0x3a

01c3f110 62ea6583 01c3f354 01c3f130 62ea6257 MSPHLPR!gethosttlbyname+0x6e

01c3f458 647484a2 1c6380a0 1c638120 00000000 MSPHLPR!ProxyGetHostByName+0x32c

01c3f498 64748e4e 1c638120 00000000 00000000 w3filter!+0x2cc (FPO: [Non-Fpo])

... <= Hiding all the wspsrv calls since this is not relevant at this point

01c3ffec 00000000 00456baa 00000001 00000000 kernel32!BaseThreadStart+0x34

 

- The DNSQuery_A function has the following parameters:

 

DNS_STATUS WINAPI DnsQuery(

  __in PCTSTR lpstrName,

  __in WORD wType,

  __in DWORD Options,

  __inout_opt PVOID pExtra,

  __out_opt PDNS_RECORD *ppQueryResultsSet,

  __out_opt PVOID *pReserved

);

From: http://msdn.microsoft.com/en-us/library/ms682016(VS.85).aspx

 

The first parameter is the one that is interesting for us at this point, since we want to know which name is trying to resolve. You can find this parameter in the third column in the stack above, which is 01c3f354. We can use the command da to display the value of this parameter as shown below:

 

0: kd> da 01c3f354

01c3f354 "fourthcoffee.com"

 

- As you can notice we are performing the name resolution for the internal domain and the thread that is performing that is actually waiting a LPC reply. Let's take a look at this LPC message:

 

0: kd> !lpc message 00083100

Searching message 826fe in threads ...

    Server thread 882ee590 is working on message 826fe

Client thread 8846c470 waiting a reply from 826fe

Searching thread 8846c470 in port rundown queues ...

Server communication port 0xe1fe5020

    Handles: 1 References: 1

    The LpcDataInfoChainHead queue is empty

        Connected port: 0xe203d118 Server connection port: 0xe1444ac8

Client communication port 0xe203d118

    Handles: 1 References: 11

    The LpcDataInfoChainHead queue is empty

Server connection port e1444ac8 Name: DNSResolver

    Handles: 1 References: 24

    Server process : 894e8020 (svchost.exe)

    Queue semaphore : 898fe978

    Semaphore state 0 (0x0)

  The message queue is empty

    The LpcDataInfoChainHead queue is empty

Done.

 

- We are waiting for a reply from the DNSResolver which belongs to the svchost.exe (server process id = 894e8020).

- You can still going on to see what the DNSResolve is doing by switching to this process with the command .process 894e8020.

 

0: kd> .process 894e8020

Implicit process is now 894e8020

 

- Next thing to do is to dump all the threads within this process with the command !process 894e8020 f, which will give you also a huge output (since most of the time svchost.exe has lots of threads).

 

- Focus on getting to the point, in other words from this huge output you need to look at the LCP message id that will be exposed in one of the threads. A tip here is to run .cls before run the !process command to clean all your screen, then run the command, copy all the content to clipboard (CTRL+A and CTRL+C), paste on a new notepad file, do a CTRL+F and search for this message id. In this case the message id that we are looking for is 0083100 which you need to search by removing the 00, just search for 83100. Here it is the thread that we are looking for:

 

THREAD 882bf020 Cid 031c.0ab0 Teb: 7ffa7000 Win32Thread: 00000000 WAIT: (Executive) UserMode Non-Alertable

8790287c NotificationEvent

Not impersonating

DeviceMap e1630f98

Owning Process 894e8020 Image: svchost.exe

Wait Start TickCount 70704 Ticks: 21 (0:00:00:00.328)

Context Switch Count 713

UserTime 00:00:00.000

KernelTime 00:00:00.000

Win32 Start Address 0x00083100

LPC Server thread working on message Id 83100

Start Address kernel32!BaseThreadStartThunk (0x77e617ec)

Stack Init b95a4000 Current b95a3bb0 Base b95a4000 Limit b95a1000 Call 0

- From this point on the approach is the same: switch to 882bf020 process and dump the stack and see what is going on. To summarize this particular exercise by the following conclusion:

· ISA was waiting to resolve the name fourthcoffee.com. It was waiting for Windows DNSResolver to perform his job trying to resolve this name.

· Switching context from wsprsrv.exe process to svchost.exe (where dnsresolver runs) we see that the name resolution is taking some time to happen but eventually will happen (at least for this case) but it will return a wrong IP address.

 

At this point is important to emphasize that if wspsrv.exe keep waiting for the DNS for more long time (because maybe DNS is waiting for something else to happen) and more threads within wspsrv.exe process are having the same issue we might have a scenario where ISA stops completely to answer, causing the side effect described in the article below:

ISA Server 2006 stops answering requests
http://blogs.technet.com/isablog/archive/2009/01/12/isa-server-2006-stops-answering-requests.aspx

 

In this case we had a delay to answer but it did answer, unfortunately with the wrong IP address since it was doing an internal name resolution sending to the external interface.

 

5. Conclusion

 

As explained throughout this article name resolution is a MUST HAVE WORKING item when doing a health check on ISA. Many weird behaviors can happen when ISA doesn’t obtain a good answer from the DNS Server leading you to troubleshoot many other things before realize that name resolution is the root cause for the problem.

 

 

Author

Yuri Diogenes

Security Support Engineer

Microsoft CSS Forefront Edge Team

Technical Reviewer

Thomas Detzner

Escalation Engineer

Microsoft CSS Forefront Edge Team