Problem with Bidirectional Affinity When Web Publishing with TMG SP1 Rollup 4 and Below

 

Hello all!  It’s me, Brett Crane, from the Security teams.  I worked a really interesting issue with one of my customers recently and wanted to point out our findings and how to fix or work around the issue…

Problem:

We found that when we have a 2 node TMG Array (running Service Pack 1, Update 1 Rollup 4 and below) with NLB enabled on both the internal and external interfaces, Session Affinity doesn’t seem to be working properly. The peculiarity here is that the load balancing affinity issue is only seen when utilizing Web Publishing Rules. When server Publishing rules are utilized everything works fine.

When running NLB on both the internal and external interfaces of TMG we utilize what is termed as bidirectional affinity. Basically, bidirectional affinity enables NLB-enabled firewall arrays to support complex protocols requiring secondary connections and ensures that secondary connections are routed through the same firewall through which the initial client request was made. This prevents connections from being routed to an array member that has no knowledge of the initial connection. This is the piece that was being seen as a suspect from the start.

Data Analysis:

TMG utilizes NLB Hook Rules to make decisions on whether it will use the source IP address or destination IP address as affinity. Based on this we immediately checked our NLB Hook\Hash rules that TMG creates (or should create) when we created the Web Publishing rule pointing at the Internal Web Server.

* To check your Hook rules on your TMG server you will run the following command from an elevated command prompt: netsh tmg show nlb

** for more information on troubleshooting NLB review the following KB article: https://technet.microsoft.com/en-us/library/ff849728.aspx

In our test we created a Web Publishing rule and pointing to a single web server named Web.domain-01.com (The external name of the site is different though. It is External.com). The servers LAN IP address was 10.61.221.99. This is a clipping of the rule tested with:

 

clip_image002

 

This is what we saw when we ran the Netsh command using the “find” command to see only the IP address of that server:

 

clip_image003

 

It can’t find and hook rules created for that IP Address!

We deleted the Web Publishing rule and created a Server Publishing rule for the HTTP protocol and pointed the “To” tab to the same IP address:

 

clip_image004

 

We then ran the same Netsh command. This time we saw a much better return:

 

clip_image006

 

So at this point we see what looks to be a problem! We tested the Web Server rule with an internal Web Farm as well (versus the single internal server) and saw similar issues to where Hook rules were not being created.

Conclusion:

Through further testing and internal debugging of the product we were able see that we actually have multiple issues going on here. It turns out that the code behind the Web Publishing to a single server and Web Publishing to a Backend server farm behave differently and fall under separate issues. The Web Publishing to a single server is actually fixed in a hotfix that was added to Service Pack 2 for TMG. So if you see this problem when publishing a single server all you need to do is update to Service Pack 2.

“Well what was the problem?” you ask…

 

What is actually happening is all based off the highlighted area below:

 

clip_image008

 

The hook rules are created based off name resolution. The name in the Published site area is not able to be resolved via DNS internally therefore no Hook rule is created. But, when the name isn’t resolved it should default to the administratively entered IP address just below it and create the hook rule. This is not done properly prior to Service Pack 2.

* For more information on the single server issue you can refer to the following TechNet article:

FIX: Network Load Balancing hook rules are not created correctly when the name on the To tab of the Web Publishing rule does not resolve to a published web server in Forefront Threat Management Gateway 2010

https://support.microsoft.com/kb/2591269

“Can we work around this? We can’t install SP2 at this time.”

 

The answer to this question is yes but it is preferred that you updated to SP2. The workaround to this would be to create a hosts file on your TMG server that resolves External.com to 10.61.221.99. You would then have to recreate the rule.

I mentioned above that publishing to a backend server farm and letting TMG manage your farm instead of a 3rd Party Load Balancing device is different than single server publishing. Due to those differences installing SP2 will not resolve your issue. You will know you are configured in this manner if you look in your rule and see the following tab:

 

clip_image009

 

This has been determined to be a problem and will be fixed in future versions of the product. At the time the only method of getting the bidirectional affinity hook rules in place is to utilize the following workaround:

Open the “hosts” file in notepad utilizing an elevated command prompt on the TMG Array nodes (C:\Windows\System32\drivers\etc) . You will want to create an entry for the IP address of each of the servers you listed in your Web Farm in TMG In the hosts file. Those IP Addresses will resolve to the name listed in the rules “Internal site name: ” setting (This would look very similar to a DNS Round Robin type of entry in DNS). Here is an example based on our rule above. We are pointing to 2 internal servers:

 

clip_image011

 

After saving the hosts file you will want to delete and recreate your Web Publishing rule. This will cause it to create the NLB hooks needed to have bidirectional affinity work properly therefore resolving your issue!

 

I hope this information helps you out!