Disclaimer: facts and figures in this article are based on the state of the technology as it exists at the date of its publication.
Our article today in our “VMware or Microsoft?” series is about availability.
When I say “availability”, I mean “high availability”.
And when I say “robust high availability”, I mean a solution such as Windows Failover Clustering that provides high availability and scalability of server workloads.
I argue that Microsoft’s solution is robust and solid, but VMware has argued differently. In a currently available document that VMware provides comparing vSphere 5 to the as-of-then beta of what is now Hyper-V in Windows Server 2012, VMware makes the claim that they have “robust high availability” with a “single click, [that] withstands multiple host failures”, whereas Microsoft’s Failover Clustering is “based on legacy quorum model, complex and brittle”.
Really? They haven’t been watching how far clustering has come in Windows Server lately. In fact, at best, VMware’s document might be referring to how failover clustering used to work back in 2008. More specifically, they are referring to the quorum model of how a cluster needs a majority vote to determine whether or not a node is actually unavailable, so that the resources it was managing can fail over to other nodes. To ever have a solid majority, the number of voting members needs to be an odd number. All nodes get a vote, and so if you have an even number of nodes, you need something else to break the tie. So to make that work, you need some other “cluster witness”; which is either a “witness disk” or a “witness file share”.
From this document on Windows Server 2008 failover clustering:
In a cluster with an even number of nodes and a quorum configuration that includes a witness, when the witness remains online, the cluster can continue sustain failures of half the nodes. If the witness goes offline, the same cluster can sustain failures of half the nodes minus one.
Well then, please allow me to introduce you to…
The Dynamic Quorum
“Batman and Robin?”
No.. that was the “dynamic duo”. I’m talking about the ability of all nodes in a Windows Failover Cluster to have a vote, and for the number of voting members to adjust dynamically as nodes fail; so that there is never any confusion (lack of a quorum) by having an even number of voting members.
Is this diagram…
…we see a healthy 4 node cluster, each running 2 VMs, or any other clustered roles. (Windows Failover Clustering is not just for virtualization, you know.) The quorum is maintained because we have a disk witness to break the tie in case two nodes say “one node is down!” and the other two say “no, he’s not!”.
If one of the nodes in our cluster goes away…
…depending upon whether that removal was planned or a complete surprise, the clustered roles are able to failover or restart on other nodes. AND, because the cluster now only has three active nodes, then that in itself becomes a quorum of voting members.
“When a node shuts down or crashes, the node loses its quorum vote. When a node successfully rejoins the cluster, it regains its quorum vote. By dynamically adjusting the assignment of quorum votes, the cluster can increase or decrease the number of quorum votes that are required to keep running. This enables the cluster to maintain availability during sequential node failures or shutdowns.”
Later, if either the node is re-added, it again gets a vote.
Robust. But wait… there’s more…
The Dynamic Witness
“If the cluster is configured to use dynamic quorum (the default), the witness vote is also dynamically adjusted based on the number of voting nodes in current cluster membership. If there are an odd number of votes, the quorum witness does not have a vote. If there is an even number of votes, the quorum witness has a vote.
The quorum witness vote is also dynamically adjusted based on the state of the witness resource. If the witness resource is offline or failed, the cluster sets the witness vote to ‘0’.”
The benefit of this is for the rare case of a witness failure. If that happens, the vote simply goes away and is assumed to not be there. A huge benefit of all of this is that you never really have to worry about whether or not to count your nodes and the to configure a quorum witness or not. Just do it (as recommended), and let the dynamic nature of our failover clustering take care of it.
Guest Clustering Without Limits
Microsoft has a distinct advantage over VMware when it comes to guest clustering. With Hyper-V and with virtual servers running Windows Server 2012 or 2012 R2, clusters of virtual machines can be created that use iSCSI, Fibre Channel, and even .VHDX files (in R2) as the location for their shared storage in either a Clustered Shared Volume (CSV) or just a server file share (SMB Share – file based storage).
So here are a couple of the new, flexible choices you have for guest clustered VM shared storage in Windows Server 2012 R2…
Try doing that on NFS.
While we’re on the subject of scale…
Does Size Matter?
VMware requires Essentials Plus or better for HA, and unless something else changed in vSphere 5.5 that they haven’t yet said much about, I do believe they still can only support up to 4000 VMs in a 32 node cluster. (Correct me in the comments and point me to documentation that proves me wrong, please. I sincerely thought they would up their game here.)
“Holy robust high availability, Batman!”
I’m glad you like it. But if not, or if you have any questions, let me know in the comments.
And for more details on what’s newer than what VMware would have you believe in the world of robust high-availability, check out these two TechNet documents: