ISA Server Losing Secure Channel with the DC - The 5783 Nightmare

The first time that I blogged about this was in October 2006 at ISA Server Team Blog, we were getting so hammered about that issue that it motivates us to create an official article about that. Last year we published an article at Microsoft TechNet Library explaining how to improve Web Browsing performance using IE7 with Kerberos passing through ISA Server 2006.

The TechNet article explains in details why ISA Server 2006 could generate the event 5783 when it lost the secure channel with the Domain Controller. There are many, many aspects that need to be considered when this happens, things such as:

· System Performance

o ISA performance as well as DC performance.

§ Is ISA Server too busy during that time?

§ Is the DC too busy to answer?

· Network

o Is the DC in the same LAN as ISA Server?

o There is any communication problem between them during that time?

· Third Party

o Do you have any application filter installed on ISA Server?

o Does the issue happen without this application filter?

· Core OS Files

o If the Servers (ISA and DC) have Windows Server 2003 SP1, do you have at least updated the netlogon.dll and tcpip.sys for a version higher then 5.2.3790.1830?

o If it is using Windows Server 2003 SP2, did you already disable the registry keys that can cause problem?

· Client Side

o Is your client using IE6 or IE7?

o Is the IE7 configured to use the FQDN of the ISA Server as proxy or the IP Address?

As you can see just in five main bullets we have tons of tests to be done to determine where could be the potential point of failure. The challenge of this scenario is really gather the right data at the right time. The right data includes also network monitor trace on both ends: DC and ISA.

The problem that almost of the customers runs into is that they start the netmon trace when they see the event 5783 in the system log. However this doesn’t help, because by the time that you see the event the issue already happens, therefore that netmon trace is useless. Good news here is that a friend of mine from Netmon Team (Paul Long) has blogged about how to stop a capture based on an event. This can help to collect the netmon trace and stop the capture when the 5783 happens.

If you read the t Microsoft TechNet Library article about this behavior you will see a scenario where the client is located in a child domain that is located across the WAN. This scenario shows one potential risk in case we have bottleneck in the WAN and also on the DC for the Child domain. To make it easier for you to see what it happens behind the scenes I simulated the environment on my VMs and I used one Microsoft Internal Tool to emulate a network with high latency. This simulation shows the client in a child domain that is located in the network 10.30.30.0/24 trying to access Internet through an ISA Server that is located in the network 10.20.20.0/24. During normal business hours the issue doesn’t happen, but during the rush time where network is really busy….then…you will see what happens.

Check it out the result here and see the 5783 happening.

Note about the Simulation: this simulation was created using a virtual environment and a tool that emulates network conditions. The result on a real environment could vary. The values showed in the simulation don’t guarantee that on a real environment the issue also will happen. There is no commitment that you will get the same result trying to simulate the issue on your own.