Nodes being removed from Failover Cluster membership on VMWare ESX?

Welcome to the AskCore blog. Today, we are going to talk about nodes being removed from active Failover Cluster membership when the nodes are hosted on VMWare ESX. I have documented node membership problems in a previous blog:

Having a problem with nodes being removed from active Failover Cluster membership?
https://blogs.technet.com/b/askcore/archive/2012/02/08/having-a-problem-with-nodes-being-removed-from-active-failover-cluster-membership.aspx

This is a sample of the event you will see in the System Event Log in Event Viewer:

image

One specific problem that I have seen a few times lately is with the VMXNET3 adapters dropping inbound network packets because the inbound buffer is set too low to handle large amounts of traffic. We can easily find out if this is a problem by using Performance Monitor to look at the “Network Interface\Packets Received Discarded” counter.

image

Once you have added this counter, look at the Average, Minimum and Maximum numbers and if they are any value higher than zero, then the receive buffer needs to be adjusted up for the adapter. This problem is documented in VMWare’s Knowledge Base:

Large packet loss at the guest OS level on the VMXNET3 vNIC in ESXi 5.x / 4.x
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2039495

I hope that this post helps you!

Thanks,

James Burrage
Senior Support Escalation EngineerWindows High Availability Group