How to patch a TMG array– some thoughts on NLB high availability

One of the reasons for using an array is the availability of NLB, which is known to provide fault tolerance and load balancing.

NLB relies on heartbeats to determine whether the cluster nodes are alive. The nodes divide the potential client IP addresses among each other (in fact actually the hashes of the IPs) and send each other heartbeats, thereby notifying the other members that they are up and running.

As soon as a node is down (fails to send the heartbeats), the remaining nodes take ownership over the failing node’s IP hashes, providing coverage for all the clients normally served by the broken node.

How does this all relate to patching?

For new connections, the above behavior is straightforward, but what if we just “unplug” one of our TMG machines? What happens to existing connections served by this box? No other node will be aware of the state of these connections, so essentially they will simply fail.

This is exactly what happens if you just all of a sudden patch and reboot one of your TMG nodes.

How can we circumvent this? Is there any workaround?

In general, NLB supports what is called “drain mode”.

When your “drainstop” a node, NLB will still serve existing connections owned by that node but it won’t accept new connections. New connections will be handled by the other available nodes in the array.

With that, If you are intentionally taking a node offline then you can use drainsstopping to service all the active connections before you take the node offline for patching.

Therefore, when patching a particular TMG node, Ideally you will :

1. drain the node, wait until the session count drops to zero (sessions tab)

2. suspend it (so that NLB won’t be automatically started on next reboot)

3. patch the node

4. make sure the system operates properly with the patch

5. start NLB again to make the node join the array again

Here is a screenshot of the NLB options available in the TMG Management console:

clip_image001

Reference:

http://technet.microsoft.com/en-us/library/cc725691.aspx

Authors
Balint Toth
Support Escalation Engineer
Microsoft CSS Forefront Edge Team

Technical Reviewer
Eric Detoc
Escalation Engineer
Microsoft CSS Forefront Edge Team