Failover Clustering of SAP Systems with Hyper-V – Deployment Options

In the last months we got always the question from customers how to deploy and run SAP systems on a virtualized environment with clustering. But before we discuss the possible configurations we need to clarify the nomenclature in this context.

Host Clustering:

This is when you create a Microsoft Failover Cluster inside a physical machine, where the nodes are running as a physical machine. Here the failover takes place between two physical computers.

(Windows Server 2008 R2 LiveMigration)

Guest Clustering:

This is when you create a Microsoft Failover Cluster inside a virtual machine, where the nodes are running as a virtual machine. Here, the failover takes place between the guest operating systems.

(Windows Server 2008 R2 Failover Cluster)

While Host Clustering can monitor the state of the virtual OS, it does not monitor the health of applications inside the VMs, which only can be accomplished by using “Guest Clustering”.

Failover Guest Clustering also monitors the health of the VMs, so if the virtual OS crashes, hangs, or blue screens, it can be automatically restarted, however the end user would need to wait a little longer while the VM state is loaded and started.

 

How does this work with SAP applications, where we need to protect the single point of failures (SPOF) like the database RDBMS (SQL Server) and the SAP central services (SCS)?

The easiest and straight forward configuration is the usage of Host clustering as shown in the first picture.

Here all the central SAP components are installed in one VM (SAP Central instance and SQL Server). In case of an unplanned downtime of the physical server, the whole VM including the SAP CI and SQL Server is moved to another physical host and restarted there. The downtime for this procedure depends on the allocated memory of the virtual machine. All open transactions inside the database and the SAP system will be lost and needs to be rolled back.

A second option is to combine two virtual machines into a Windows Server Failover Cluster (WSFC).

On one node the SAP SCS is running, whereas the second host the SQL Server service. In case of an unplanned downtime of the physical server, the affected service will be failover to the remaining cluster node.

The impact of an unplanned failover for the case of SQL Server is that the open transactions will be lost. By using the SAP Enqueue replication function, which replicates all Enqueue entries to the VM on the other host server, there is no impact for the SAP service for the occurrence of a failover.

The SAP end users are normally logged into other dialog instances outside the two VMs, so they are not impacted from a failover between the two nodes.

Also planned downtime and maintenance on the host servers can be compensated with this Cluster configuration.

A last possible configuration would be a combination out of both solutions, host and guest clustering, BUT these configuration is NOT supported by SAP.

SAP OSS message 1374671:

SAP either supports Guest Failover Clustering, or Host Failover Clustering, but not the combination of them

High Availability and Patching considerations

In the table below information about the downtime impact for planned and unplanned downtime is listed.

Failover Clustering

Guest

HostMSCS Quick Migration (One LUN per VHD)

HostHyper-V Live Migration (Windows Server 2008 R2 with CSV, Cluster Shared Volumes)

Guest/Host

Support by MS

Yes

Yes

Yes

Yes (except Exchange Server)

Support by SAP

Yes

Yes

Yes

No

Managebility

Medium

Medium

Medium

High

Complexity

Medium

Low

Low

High

Failover time ( Server outage planned)

Up to 5min (time for SQL Server failover in MSCS)

~ 5min (Depends on RAM of the VM)

Zero

-

Data loss (Server outage planned)

Zero

Zero

Zero

-

Failover time (Server outage unplanned)

Up to 5min (time for SQL Server failover in MSCS)

~ 1min (time to start VM on second server)

~ 1min (time to start VM on second server)

-

Data loss (Server outage unplanned)

Uncomitted SQL Server transactions will be rolled back.

All open transactions and data will be lost

All open transactions and data will be lost

-

Patching of the Guest OS (Impact for SAP)

Minimal downtime (Time for SQL Server failover)

Downtime (standard maintenance for patching)

Downtime (standard maintenance for patching)

-

Patching of the Host OS (Impact for SAP)

Minimal downtime (Time for SQL Server failover)

~ 5min (Depends on RAM of the VM)

Zero Downtime

-

Usage of application HA functionality (e.g.: ERS)

Yes

No

No

-