Virtualizing Tier 1 SQL Server

A while back I published a couple of posts on virtualizing SQL Server, and in the light of developments in both the virtualization platform out there and SQL Server itself I feel the need to do a complete rewrite. 

The traditional approach to implementing high availability (HA) in SQL Server has been to create a cluster and for this to be a more resilient HA means three nodes or more are needed to maintain HA with two nodes while one node is offline for planned maintenance, for example to either patch the OS or the SQL Server node itself.   What does this mean for virtualization? If you are using Hyper-V  it doesn’t really matter; the VM’s comprising this cluster (aka a guest cluster) are kept on separate physical nodes on a (physical) cluster you can patch the hosts the guest OS, and SQL Server all using Cluster Aware Updating (CAU) in Windows Server 2012.  However it’s not quite so easy in VMware, you‘ll have use VMware Update Manager to patch the hosts and then use CAU to patch the guest OS and SQL Server. Moreover as far as I know you can only have a two node guest cluster in VSphere so while you are patching SQL Server you are down to one node.   So what if you have to use VMware and you want more in the way of HA like you have on Hyper-V? 

One option would be to use Availability Groups in SQL Server 2012 Enterprise edition. This combines the best of mirroring/ log shipping with Clustering:

  • There’s no shared storage so I don’t see why you would be limited to a three node guest Windows Cluster.
  • Failover is very quick as there’s no shared storage each node has it’s own copy of the database being protected
  • Unlike mirroring and log shipping you are protecting a group of databases as though they were one and you can use the secondaries for reporting and as a source for backups (only full backups though). Plus you can have multiple secondaries for example a synchronous secondary in your local data centre with an asynchronous copy at another location , so a bit like replication in Hyper-V & VMware but at the database rather than VM level.  That’s an important point you should use this techniques over VM replication as all you are synching is the actual SQL Server

Your next consideration is going to be making sure you get a predictable level of performance or your users might be phoning if there’s issues with speed as well.  Tuning in a physical world has occupied many a mind and there’s tons of advice out there from MVPs, TechNet etc.   Things get tricky in a virtual world as resources are shared.  However if you are running a tier 1 on database then best practice would be:

  • CPU Don’t over commit and use all the NUMA capabilities in your hypervisor to pass through maximum performance to the database.  Bear in mind that for HA you might well want this capacity reserved on other nodes.
  • RAM can’t be over committed in Hyper-V, but shouldn’t be over committed on VMware as performance suffers.
  • IO use the latest SR-IOV cards which can recieve and mange virtual network traffic straight fro the VMs if you can.  However you can’t team SR-IOV cards so you might want to pass through multiple SR-IOV NICs to a VMand then team inside the VM (while you can do this in Hyper-V but I am not sure if you can do that on VMware where the guest OS is Windows Server 2012).  If not use NIC teaming at the host level and the appropriate  teaming policies for access to the database (VMware advice on this is under networking policies here).
  • Storage Access questions usually revolve around whether to use RDMA/pass through disks so that the database itself is stored directly on a LUN referenced by the VMs. There’s actually very little difference these days and in both platforms you could use a share if you have a Windows Server 2012 fileserver running storage spaces. 

The definitive white paper for virtualizing SQL Server 2012 is here. However the latest version of a best practices guide for running SQL Server on VMware I could find is here but it’s three years old and so applies to older versions of SQL Server (typically 2005/2008) and Windows Server 2008.  Hopefully this will change as Windows Server 2012  & SQL Server 2012 are now supported and of course there’s going to be even more new stuff with SQL Server 2014 running on Windows Server 2012 R2. Whatever you decide to do you’ll want your HA design to be supported and the definitive word on that check KB956893.

Finally if you are a DBA reading this, one way to get to know your data centre admins is to help them with their SQL Server, as whether they are using System Center or VSphere it’s likely that the database underpinning these is SQL Server and it could probably do with a but of TLC, and a general discussion about protecting those databases too as they are vital components of your data centre.

Note:

  • My definition of tier 1 isn’t necessarily big, it’s more about the impact of a tier 1 service not being there. If it isn’t there you can’t operate trade, function etc.  Of course systems like this tend to be heavily used and predictable performance is also important too.
  • I haven’t mentioned VMware fault tolerance here because it has so many limitations that render it impractical for all but the smallest databases and I generally find that if it’s tier 1 it’s generally very big and used by lots of people and so only having one CPU doesn’t really work.