An Optimal Network Load Balancing (NLB) Configuration

When I go on-site with customers who are using Microsoft Network Load Balancing (NLB), most of the time they have it in a *working* condition, but may be having intermitten networking issues with it. In this blog posting, I'm going to talk about pro's and con's of using NLB versus hardware solutions, how it is different than Microsoft Cluster Services (MSCS) and how to configure your network topology for optimal performance and reliability.

 First, let's compare NLB to a hardware load balancer such as f5 Big-IP or Cisco Context Switch (CCS). In many ways, hardware load balancers are superior to NLB, but they are very expensive devices that you might not need. Plain vanilla NLB works best when the client IP addresses are exposed to the NLB enabled servers - this means it generally works best for intranet applications or internet applications that don't require server affinity/stickiness. A while ago, Microsoft Application Center 2000 solved this problem by adding cookie based server affinity/stickiness and full monitoring which brought the NLB servers online/offline when certain conditions were met like IIS services running and HTTP requests responded, but now we just have NLB again. The hardware load balancers typically come with cookie based server affinity/stickiness and monitoring in one package. Finally, the two great advantages of NLB is it's cheap (comes with the operating system) and there is not single point of failure - meaning each server is a peer and there is no central load distributor giving it maximum redundancy. In summary, NLB is best used for intranet applications or internet applications that do not require server affinity/stickiness, otherwise consider a hardware load balancing solution.

Regarding NLB compared to Microsoft Cluster Services (MSCS), they are different technologies, yet similar enough in terminology and topology that we inevitiably mix the two up. MSCS is where two or more servers share a hardware resource so that if one of them goes down, then the other will pick it up. MSCS uses a backend "heartbeat" network used to determine if their partner server is alive or down. NLB also has a "heartbeat", but it's heartbeat is really just broadcast TCP/IP packets on the NLB enabled network... let me repeat... on the NLB enabled network. The reason I repeated that statement is because it is common and recommended to add a second physical network adapter for servers using NLB. Unfortunately, this makes it *look* like an MSCS solution and causes confusion. The purpose of the second network adapter is in fact so the two NLB enabled servers can communicate with each other just like MSCS, but this is over a normal, everyday network (MSCS heartbeat network is almost always an isolated network or on a cross-over cable) which leads me to my last topic regarding network topology.

NLB should be configured with 2 physical network adapters in each server. The NLB enabled or frontend/public network adapter and the normal, everyday, backend, network adapter. The backend/normal/non-NLB network adapter is needed when NLB is enabled on a network adapter, it will mask the MAC address of the network adapter with a virtual MAC address that all of the NLB servers will use typically starting off with "02-BF". Imagine for a second that you only have one network adapter on the servers and you enabled NLB. NLB would work just fine and be very happy (again, it's heartbeat is a broadcast packet on the NLB enabled adapter network), but the NLB server would not be able to communicate with each other because their MAC addresses are the same. If server A wanted to network to server B, then server A would send a TCP/IP packet addressed to server B, but since the MAC addresses are the same, it never leaves server A. Some might now suggest using the multicast mode of NLB which allows the real MAC address of the network adapter to be revealed again, but many network switches don't like the idea of having 2 MAC addresses on the same port let alone having multiple ports with the same MAC addresses which I will address this issue further down.

The reason I call the backend network adapter the normal, everyday, network adapter is because *that* is what it should be. This means that the server's NetBIOS name in WINS, DNS, and whatever other network naming service you have should point to it's backend network adapter. The reaon I stress this is because if you have all of your incoming and outgoing traffic flowing only through the frontend (NLB enabled) network adpater, then what happens when your server needs to authenticate to Active Directory or get data from your database? The return response from these servers in some cases will come back to the virtual IP address which subsequently gets load balanced to one of the servers and lost - not a good thing. This is why you should do all of your internal network communications over the backend (non-NLB enabled) network adapter. Again, some may suggest using a dedicated IP address (DIP). Regardless if you use a DIP or not, the MAC address is still virtualized and therefore could still be inadvertantely load balanced to the wrong server. In addition, do not put a default gateway on the NLB enabled network adapter and do not register the server's NetBIOS name to the NLB adapter. Only add a static DNS entry in DNS to point your end users to the virtual IP address(es) on the NLB enabled network adapter. Now, you are probably thinking that if I don't put a default gateway on my NLB enabled network adapter, then how will my end users receive the server responses... read on.

 If you have followed my advise so far, you should have just the virtual IP (VIP) address(es) and Dedicated IP (DIP) address on the NLB enabled network adapter. By the way, don't confuse the DIP with the backend network adapter - the DIP refers to a special IP address that NLB uses *only* on the NLB enabled network adapter - not the backend adapter. All other networking setting are set on your backend (non-NLB enabled) network adapter (WINS, DNS, NetBIOS, default gateway, etc). If configured properly, your user requests should go to the virtual IP address on the NLB enabled network adapter and flow out through the backend (non-NLB enabled) network adapter. Yes, this is possible and is recommended especially because of switch incompatibility issues, which we will talk about next. Finally, all of your internal network communication is now being conducted over a normal, everyday, backend, network adapter, so no possibility of being misrouted.

Earlier I mentioned how some network switches don't do well with NLB. This is because of the way NLB uses MAC addresses. Well, NLB was originally designed to be used with a network hub which doesn't care about MAC addresses. I'm not saying that NLB doesn't work with switches, I'm just saying that you need to plug NLB enabled network adapters into network devices that 1) allow multiple ports to have the same MAC address, 2) allow all of the ports to receive all of the traffic, and 3) the device allow the NLB heartbeat (broadcast packets) to actually broadcast to all of the ports. If you device meets this criteria then great. Otherwise, plug all of the NLB network adapters into a hub and uplink the hub to your network switch. Doing the hub test is especially important if you are having problems with NLB not converging.

 So let's say you did the hub test and NLB finally is working for you. You are left with a problem... there aren't very many 100Mb or 1Gb hubs these days, so now we have a throughput problem, right? Well, if you followed my advise so far, then it shouldn't be a problem even if you are using a 10Mb hub. At this point, your server requests are flowing in through the NLB enabled network adapters, then flowing out the backend (non-NLB enabled) network adapter. Well, server requests are typically very small and since your responses are flowing out of the backend network adapter, all your NLB adapters have to handle is incoming requests. Hopefully, since this incoming network traffic small, a 10Mb hub should be able to handle it with the heavier output traffic going out the faster 100Mb or 1Gb network network adapter. 

I hope it is helpful. NLB is still a great and cost effective solution when well understood.

 One last note... after posting this blog entry, I was made aware that we (Microsoft) may be releasing a patch that will allow IGMP support for NLB in multicast, so this may help.