I think one of the most common calls we get with customers who have implemented Network Load Balancing (NLB) is when they notice that the servers aren’t 100% balanced. They’ll notice that one node is getting hammered with traffic while the other is sitting idle. I’m hoping that by digging a little into the way NLB works, we can give you the information you need to head things off at the pass and keep from having to call for support.
Some of the positives for NLB is that it works really well (with the right application), is easy to setup and best of all, it’s FREE! Really? Free? That’s right! Just enable the service on your servers, configure the cluster and you are balanced. But are you really “balanced”? Maybe in the way that sometimes a see-saw is balanced, but usually, it’s going to lean one way or the other. To make it clearer, let’s talk about the negatives of NLB – it’s easy to setup and it’s FREE. I repeat the positives because part of the reason it is so simple to configure is that it does not have all of the bells and whistles of a hardware load balancer, nor the cost.
The quick explanation for the way NLB works is that since every node in the cluster has the same IP address, every packet destined for the cluster IP reaches every server in the cluster. However, only the server responsible for responding to that client will reply. The others will simply drop the packet. The determination of “who is responsible” is determined by affinity, which I’ll discuss later.
Can you hear me now?
One of the first troubleshooting steps we follow if a customer says the servers aren’t “balanced” is for them to output “wlbs display” at the command prompt from every node in the cluster and see if it’s even converged. The end of the output will give you that answer:
From Host 1:
Host 1 has entered a converging state 3 time(s) since joining the cluster and the last convergence completed at approximately: 9/9/2008 9:06:17 AM
Host 1 converged as DEFAULT with the following host(s) as part of the cluster:
From Host 2:
Host 2 has entered a converging state 1 time(s) since joining the cluster and the last convergence completed at approximately: 9/9/2008 9:06:17 AM
Host 2 converged with the following host(s) as part of the cluster:
You can see from the above output that both nodes are properly converged and are ready to receive traffic. With that confirmed, let’s discuss how NLB balances.
“One for you, one for me. Two for you, two for me.”
The way NLB balances is based on the Affinity that you have selected. What exactly is Affinity? It is the mode that determines how the servers are going to balance the load. The options are None, Single or Class C, and it’s specified in the Port Rule option:
When the nodes in a cluster converge, they decide which node will respond to each request. The simplest example is when the affinity is set to Single, as shown above. In that scenario, the servers divide the source IP addresses among the nodes in the cluster. When a packet arrives at the server, it checks the source IP address against the addresses it is responsible for. If there’s a match, the node will reply. If there isn’t, the packet gets dropped.
When Affinity is set to “None,” the decision is based on the source IP and the source port. Therefore, each connection from a single client may reach a different node. As long as the application being hosted doesn’t require the client to remain on the same node, this mode is great because it provides the highest level of load balancing. SSL connections will not work well with No Affinity, but regular port 80 web traffic does.
The final mode is “Class C.” As you could guess, each node is responsible for a Class C address. This is useful when clients may pass through different proxy servers and you need to make sure they keep hitting the same cluster node.
So why are they all on the same node?
So I gave you all of that information so you can better understand what could cause clients to access the same node. Here are some examples:
- If all clients pass through a NAT device so the source address for all connections is the same and you’ve chosen Single affinity, one node will service all of the clients.
- If you don’t have many connections and notice that all connections are going to a single node, it could just be a coincidence that the one node is responsible for those source addresses. This could especially be true in a two-node cluster. Establish more connections, make sure the source address is different, and you should see the other nodes eventually respond.
- If you’ve chosen Class C, obviously you’ll need to make sure that the connection attempts are from different Class C addresses or they’ll all go to the same node.
In making the affinity decision, you need to weigh what is important in your implementation — better balancing (No affinity) or “sticky” connections (Single or Class C). Remember that the balancing will never be perfect and no decisions are based on the server load, if a service is running or the number of connections. With this information, you should be able to optimally configure your cluster and not be surprised when one server has more connections than another.
Hopefully this information will save you a call to tech support. NLB is easy to configure and in the right scenario, can save you thousands on a hardware load balancer. As always, we recommend testing it in a lab prior to deployment.
- Network Load Balancing parameters – http://technet.microsoft.com/en-us/library/cc778263.aspx
- Specifying the Affinity and Load-Balancing Behavior of the Custom Port Rule – http://technet.microsoft.com/en-us/library/cc759039.aspx
- Upgrading the Network Load Balancing Cluster (to 2008) – http://technet.microsoft.com/en-us/library/cc755161.aspx
- Network Load Balancing: Configuration Best Practices for Windows 2000 and Windows Server 2003 – http://www.microsoft.com/downloadS/details.aspx?FamilyID=d24c373e-bafc-4e31-b1b2-d86584a12ca4&displaylang=en
– Michael Rendino