Why can't we access NLB Clusters from remote subnets?

Hi there,

In today's blog, I would like to talk about NLB cluster access problems that our customers experience most of the time...

When Microsoft NLB cluster operates in multicast mode, in certain scenarios you may not be able to access the NLB cluster IP address from remote subnets whereas suame subnet access keeps working fine. You can find more information at the two most common scenarios below:

 

Problem 1:

When NLB cluster on Windows 2008/SP2 operates in multicast mode, due to a problem with NLB implementation on 2008 remote subnets cannot access NLB cluster IP address

Solution 1:

- This problem was stemming from NLB implementation

- This has been fixed by Microsoft with the hotfix KB960916.

- KB960916 is already included in Windows 2008 SP2

Problem 2:

When NLB cluster on Windows 2003 or Windows 2008 operates in multicast mode, remote subnets cannot access NLB cluster IP address. That second problem stems from the fact that some vendors (like Cisco) don't accept mapping L3 unicast IP addresses to L2 multicast MAC addresses (this happens when NLB cluster operates in multicast mode - L3 unicast IP address is the NLB cluster IP address and L2 mac address is the multicast MAC address that is chose by NLB) so you have to create a static mapping on the router to avoid such a problem. You can find more information about this problem at the following link:

https://www.cisco.com/en/US/products/hw/switches/ps708/products_configuration_example09186a0080a07203.shtml

(Taken from the above link)

Multicast Mode

Another solution is to use multicast mode in MS NLB configuration GUI instead of Unicast mode. In Multicast Mode, the system admin clicks the IGMP Multicast button in the MS NLB configuration GUI. This choice instructs the cluster members to respond to ARPs for their virtual address using a multicast MAC address for example 0300.5e11.1111 and to send IGMP Membership Report packets. If IGMP snooping is enabled on the local switch, it snoops the IGMP packets that pass through it. In this way, when a client ARPs for the cluster’s virtual IP address, the cluster responds with multicast MAC for example 0300.5e11.1111. When the client sends the packet to 0300.5e11.1111, the local switch forwards the packet out each of the ports connected to the cluster members. In this case, there is no chance of flooding the ARP packet out of all the ports. The issue with the multicast mode is virtual IP address becomes unreachable when accessed from outside the local subnet because Cisco devices do not accept an arp reply for a unicast IP address that contains a multicast MAC address. So the MAC portion of the ARP entry shows as incomplete. (Issue the command show arp to view the output.) As there is no MAC portion in the arp reply, the ARP entry never appeared in the ARP table. It eventually quit ARPing and returned an ICMP Host unreachable to the clients. In order to override this, use static ARP entry to populate the ARP table as given below. In theory, this allows the Cisco device to populate its mac-address-table. For example, if the virtual ip address is 172.16.63.241 and multicast mac address is 0300.5e11.1111, use this command in order to populate the ARP table statically:

Solution 2:

In order to resolve that problem, you have two choices:

a) Adding a static ARP entry on the router

b) Changing NLB cluster mode to Unicast

Also please always keep in mind the following when troubleshooting NLB problems:

 

 1) Do I run the latest NLB driver available from Microsoft? We have released a few updates on NLB drivers on Windows 2003, Windows 2008 and Windows 2008 R2 to address a few problems

 2) Do I run the latest NIC driver and teaming driver? We generally prefer not to run teaming on NLB clusters and may ask to dissolve the teaming if needed even though we don't have strict "not supported" statement.

 3) Do the NLB rules are correctly configured? The most common problem with that is to set affinity to "None" for stateful protocols which causes many NLB cluster access problems.

 4) Do I run the latest TCPIP driver? (preferrably the latest security update which updates TCPIP driver)

 5) Do I run the latest 3rd party filter drivers that run at NDIS layer? (for example security drivers)

 6) If NLB cluster runs on Windows 2008 R2 Hyper-V, do you disable "Enable spoofing of MAC addresses"?

 I'm going to talk about troubleshooting approaches in another blog post.

 Hope this helps

 Thanks,
Murat