We’re two dates into our roadshow and I’ve twice been asked to do a comparison of VMware and Microsoft in the high availability area.
So lets go back to basics a second. Microsoft is involved in lots of areas of software, covering: Operating Systems, several different kinds of Virtualization, server Applications and Management software (a lot of customers are keen to Manage VMware with System Center Virtual Machine Manager – which is probably worth it’s own post). We’ve got a history with high availability. Back in OS/2 LanManager days we had domains where any one of several machines could validate a logon. When we introduced WINS and DHCP in NT 3.5 we supported multiple servers being able deliver the same service to the same client. We have Network Load Balancing – Office Communications Server is designed to leverage it, and IIS in server 2008 is designed to play better with it. We introduced fail over clustering 10 years or so ago, and we’re up to our 4th generation of it with Server 2008. Exchange, SQL, file shares and virtual machines can all be clustered. Clustering at the application level is THE only way to provide high availability over a wide range of problems. If the hardware fails, if the OS running the server application fails, if the application itself fails… application level clustering saves the day. If an application is critical of itself and can be clustered there is no excuse for not clustering it.
We see the main task of Hyper-V Servers as running a reasonably static collection of Server workloads. That’s not to say workloads never move between servers: but they tend to stay put. It’s not to say we never run client workloads using Virtualization; but usually Terminal Services is a better way to run many identical "virtual desktops" Running many clients as VMs has a much bigger disk, memory and CPU overhead: but in some cases it is still the best way to go. Companies who can sell you the same solution based on Terminal Services, or Client OS virtualization (ourselves or Citrix) will tend to go the TS route: patching and application deployment is simpler that way too. VMware don’t offer that choice.
I talked about applications which are critical of themselves: over on the virtualization blog Jeff talked about consolidating applications which aren’t critical individually, but move 5, 10, 20 such apps onto one server and that server becomes critical. If it fails unexpectedly your job’s on the line. So, to allow VMs to live on shared storage and be failed over to another machine, VMware have their "HA" option and we use the clustering of Enterprise/Datacenter builds of Windows A by-product of clustering is the ability to migrate VMs from one box to another – this is quick but not "live" it involves a brief interruption of service.
This is the area where VMware have their major differentiator, VMotion. We know that some customers want to be able to move machines around with no downtime, and we’ve talked about it for a future version. I want to avoid getting into any criticism of the feature itself – with Microsoft not having it today that would have the tang of sour grapes to it. I don’t think it is controversial to say VMware’s software costs substantially more than Microsoft’s nearest equivalent; to stay in business they need to offer features which justify that cost. VMotion is just such a feature, the problem is that VMotion is touted as the cure for all ills: which it is isn’t. It lets you say "Move this machine", it copies the machine’s memory to another host and switches over in under a second. But VMotion doesn’t help with unplanned downtime (Jeff gives chapter and verse on VMware’s HA document here). So Vmotion helps with planned downtime – patching or upgrading the host. As Jeff points out in a third post we think most customers – even the ones who have a live migration solution still warn people the system will go down and do the upgrade during off hours. If both host and guest are running Windows there is the possibility to patch the guests and take them down, patch the host, and then bring everything back up together.
One other thing about VMware’s approach is that they make a feature of "sweating" the hardware to a higher level than we do – whether the workloads are client or server ones (See the argument about over-committing memory ). This means dynamically allocating resources and being able to move VMs from an overloaded box to an underloaded one. It’s really a kind of "grid" computing where the workloads (VMs) float from host to host, cost makes it necessary and VMotion makes that possible. In the Microsoft world we tend to say spend the money you save from cheaper software on more hardware, so you don’t have to sweat it as much; and workloads don’t need to hop from box to box as frequently.