Kevin Remde, my friend and peer from the Central DPE Region brings us the next post in our series. Here is an excerpt….
Disclaimer: facts and figures in this article are based on the state of the technology as it exists at the date of its publication.
Our article today in our “VMware or Microsoft?” series is about availability.
When I say “availability”, I mean “high availability”.
And when I say “robust high availability”, I mean a solution such as Windows Failover Clustering that provides high availability and scalability of server workloads.
I argue that Microsoft’s solution is robust and solid, but VMware has argued differently. In a currently available document that VMware provides comparing vSphere 5 to the as-of-then beta of what is now Hyper-V in Windows Server 2012, VMware makes the claim that they have “robust high availability” with a “single click, [that] withstands multiple host failures”, whereas Microsoft’s Failover Clustering is “based on legacy quorum model, complex and brittle”.
Really? They haven’t been watching how far clustering has come in Windows Server lately. In fact, at best, VMware’s document might be referring to how failover clustering used to work back in 2008. More specifically, they are referring to the quorum model of how a cluster needs a majority vote to determine whether or not a node is actually unavailable, so that the resources it was managing can fail over to other nodes. To ever have a solid majority, the number of voting members needs to be an odd number. All nodes get a vote, and so if you have an even number of nodes, you need something else to break the tie. So to make that work, you need some other “cluster witness”; which is either a “witness disk” or a “witness file share”.
From this document on Windows Server 2008 failover clustering:
In a cluster with an even number of nodes and a quorum configuration that includes a witness, when the witness remains online, the cluster can continue sustain failures of half the nodes. If the witness goes offline, the same cluster can sustain failures of half the nodes minus one.
Well then, please allow me to introduce you to…
The Dynamic Quorum
“Batman and Robin?”
No.. that was the “dynamic duo”. I’m talking about the ability of all nodes in a Windows Failover Cluster to have a vote, and for the number of voting members to adjust dynamically as nodes fail; so that there is never any confusion (lack of a quorum) by having an even number of voting members.
Is this diagram…
…we see a healthy 4 node cluster, each running 2 VMs, or any other clustered roles. (Windows Failover Clustering is not just for virtualization, you know.) The quorum is maintained because we have a disk witness to break the tie in case two nodes say “one node is down!” and the other two say “no, he’s not!”.
For the rest of the blog post, click here –