For those of you who have been following the “VMware FUD Fiasco” blog thread (Part 1, Part 2, Part 3 & Part 4), it’s been an unfortunate incident that has gone on far too long. However, I’m pleased to report that we’ve finally reached resolution and it’s time to close this chapter. What resolution am I talking about? Specifically,
VMware has admitted their mistake, apologized and removed the video.
Let’s start with VMware’s apology and then I’ll discuss our testing and results.
VMware Apology From Scott Drummonds
“I made a bad call.
About a month and a half ago, I anonymously posted a YouTube video depicting a controversial test of Microsoft’s Hyper-V. The video was a bit hyperbolic in its dramatization of Hyper-V’s reliability.
Unfortunately, my intention to stir the pot with eye-poking banter has put my credibility and by association VMware’s credibility in question among some of you. For this I apologize.
I’ve removed the video from YouTube. I’ve also sent a note of apology to Jeff Woosley at Microsoft.
My focus, and clearly VMware’s focus, is to help our 140,000 plus customers get the most from their technology investments. This is our commitment. We will absolutely work our best to live up to the high standard you’ve come to expect from us. And when we mess up, we’ll be the first to address the mistake head on.
Scott: Looking forward to your email. Please send it to Jeff Woolsey, not Woosley. Tx.
IT 101: System Requirements
Before we dive into our testing and results, now’s a good time to mention that whenever you deploy any software you should always ensure that you meet the minimum system requirements for any software. I know this sounds like IT 101, but whether it’s Windows, ESX, Exchange, SQL, Oracle, <insert application/operating system here>, those minimum requirements are there for a reason. Ensuring you meet system requirements should be Step 1 in any deployment. If you don’t meet minimum requirements, you’re running in an unsupported and likely untested configuration.
- Windows Server 2008 has a minimum requirement of 512 MB of memory. However, the VMware test was configured with 256 MB. (BTW, Windows Server 2008 Setup includes a block to prevent you from installing on a systems that don’t meet the minimum memory requirement. However, once you’ve installed Windows Server 2008, you can subsequently reduce the memory size. Thus, you have to go out of your way to get into this state.)
- Windows Server 2008 (x64) requires a minimum of 32 GB of disk space. The VMware test was configured with 12 GB.
- Windows Server 2008 (x86) requires a minimum of 20 GB of disk space. The VMware test was configured with 12 GB.
In short, VMware’s performance benchmark team didn’t even get Step 1 correct. I’m going to leave it at that.
Despite these tests not meeting the minimum system requirements, we configured our tests to match these settings.
After setting up and configuring the test, we let it run. For days the test ran without issue. We let it run over the weekend.
When we came back in the office Monday morning, lo and behold, we encountered a guest crash.
We immediately performed a root cause analysis and traced this back to an issue that was found and fixed (KB article and fix posted) in Windows (specifically PatchGuard) over a year ago in April 2008. Let’s dig in.
The issue (and details) I’m referring to is here. I reviewed our support logs to see how common this issue is. It’s quite rare in fact. We’ve had less than two dozen reports since Windows Server 2008 went gold and there are millions of Windows Server 2008 deployments and thus, the fix was not made available via Windows Update. However, a public solution has been in place since April 2008 so if a customer searched our public Knowledge Base or contacted our Customer Service & Support (CSS) they’d easily find a resolution. In addition, the fix has already been included with both Windows Server 2008 Service Pack 2 and Windows Server 2008 R2.
To be crystal clear, here’s how this relates to Windows Server 2008 Gold, Windows Server 2008 SP2, and Windows Server 2008 R2 with links.
- Windows Server 2008 Gold: In the rare occurrence that someone ran into this issue, we posted a Knowledge Base Article which includes a fix. That link is here: http://support.microsoft.com/kb/950772/
- Windows Server 2008 Service Pack 2: The fix has already been incorporated into Windows Server 2008 Service Pack 2 which is publicly available here: http://technet.microsoft.com/en-us/windows/dd262148.aspx.
- Windows Server 2008 R2 (and future): The fix has already been incorporated into Windows Server 2008 R2 (and future releases). Windows Server 2008 R2 is currently at the Release Candidate milestone and will ship for the holidays.
Finally, I’d like to point out that this issue could occur within any virtual machine environment such as Hyper-V, Xen or VMware ESX/vSphere. In fact, we know it occurs on ESX because, in researching this issue, I found that VMware reported this issue on ESX.
No Parent Crash Here
With the guest issue identified, we started up the test again so we could look for any parent crashes.
- Days pass. No issues/no crash.
- We setup additional, larger systems from multiple vendors and run the same tests with even higher consolidation ratios. 64+ VMs running for days.
- Many days pass. No issues/no crash.
- We decide to add some of our own private stress tests into the mix that stress every part of the system. No issues/no crash.
To sum it up, we haven’t been able to reproduce any host crash at all.
I’d also like to reiterate that of the 750,000+ downloads of Hyper-V RTM, we’ve had 3 reports of crashes under extreme stress and with the same error code as seen in the video bugcheck (0x00020001). The solution in all three cases was to upgrade the server BIOS which solved the problem. This can happen as hypervisors interact very closely with the hardware and BIOS updates generally include updated microcode for processors oftentimes to address errata.
At this point without the physical system we’re unable to repro and if Scott/VMware would like to send the crashdump, we’d be interested in reviewing it, but at this point we’re out of avenues to explore.
Virtualization & High Availability
This seems like an opportune time to remind folks about the importance of virtualization and high availability (HA). Since day one, Windows Server 2008 Hyper-V integrates with Failover Clustering to provide high availability. This means in the case of something very bad occurring, like someone yanking out the power cable or hardware failure, Hyper-V virtual machines will automatically restart on another server without user intervention. If you’re running virtual machines in a production environment, high availability is a must. If you’re running virtual machines without HA, I strongly urge you to rethink that decision.
Our customers understand the importance of virtualization and High Availability. In fact, the top feature request from our customers for our free Hyper-V offering, Microsoft Hyper-V Server, was to provide High Availability.
We thought we’d do better and provide both High Availability and Live Migration with Microsoft Hyper-V Server 2008 R2.
That’s customer focus.
To Our Customers
We are pleased and humbled by the incredible response you’ve given Hyper-V.
Specifically, I mean:
- In the first 7 months of Hyper-V RTM availability, we’ve had over 750,000 downloads of Hyper-V gold bits.
- Hyper-V is the fastest growing x86/x64 hypervisor in history. We are laser focused on our customers and providing high performance, high quality virtualization for everyone from small business to Fortune 500 customers.
- We have hundreds of Hyper-V case studies from customers worldwide and we’re winning new customers daily.
We thank you for your support and look forward to helping you reduce your costs, optimize your infrastructure and provide the best solutions that span your desktops to the datacenter to the cloud.
Principal Group Program Manager
Windows Server, Hyper-V