Unable to Access Resource behind TMG after enabling NLB

Consider a scenario where the TMG administrator is publishing servers that are behind TMG and after enabling NLB on the External interface the users are not able to access those resources. If he uses the DIP (Dedicated IP) to publish the resource it works. The basic diagram is showed in the figure below:

image

The traffic from inside (VLAN E) to outside resource (for example to VLAN A) using the internal VIP (10.10.10.1) as default gateway was working, however the incoming traffic (from VLAN A for instance) to the external VIP (192.168.4.10) was failing.

Troubleshooting

There is no clever troubleshooting here, always start from the basics in scenarios like this, which basic? Try enabling live logging and see the traffic is ever arriving to TMG in the first place. In this case was not but, to be in the safe side we installed netmon in all three nodes to see if the traffic was hitting the box at all. It was not, no traffic arriving on TMG.

Resolution

After reviewing the core switch we were able to see that the NLB MAC address in there was wrong, hence the traffic was never arriving on TMG. This was a Cisco 6500 Switch Series and we follow Cisco’s recommendation below in order to address this issue:

Cisco Catalyst 6500 Series Switches - Catalyst Switches for Microsoft Network Load Balancing Configuration Example
https://www.cisco.com/en/US/products/hw/switches/ps708/products_configuration_example09186a0080a07203.shtml

Additional Info

TMG NLB leverages Windows NLB and all the recommendations that we have on Windows NLB applies to TMG NLB. In other words, if Windows NLB has restrictions, TMG will obey those restrictions. Here are some important points to remember when implementing NLB:

Switch is operating in Layer-3 mode

NLB is not supported when the hosts are homed to a switch operating at Layer-3. Instead, create a VLAN for all the nodes in the NLB cluster, and configure that VLAN to operate in Layer-2 mode.

An unusual number of TCP connections to the cluster are being reset.

Possible Causes:

  • The switch to which the cluster hosts are connected may have learned the cluster MAC address, though this is rare.  This can cause TCP traffic to be delivered to the wrong NLB host, resulting in a connection reset (unicast mode only). See the section Switch is learning the MAC Address for details.
  • The switch to which the cluster hosts are connected is a layer 3 switch.  NLB requires layer 2 switching.

See more “gotchas” at https://download.microsoft.com/download/3/2/3/32386822-8fc5-4cf1-b81d-4ee136cca2c5/NLB_Troubleshooting_Guide.htm

…and always remember the 5 commandments when troubleshooting NLB on TMG:

1. Never assume that the problem is on TMG in the first place.
2. Never think that because your network infrastructure worked fine for years without problems that is free of issues.
3. Do not take it personal if someone says: you’ve got a problem on your switch.
4. Always have a hub available for testing purpose, sometimes having a dumb device to validate NLB functionality can save much more time than dealing with smart devices for hours.
5. NLB Unicast causes switch flooding, don’t be surprise by having flooding after enabling NLB on TMG and blame TMG, this is how Windows NLB works.