Taking a closer look at the Virtual Machine Manager cluster overcommit algorithm

~ Hilton Lange | Software Engineer

ToolsOne common question we get here on the support team for System Center 2012 Virtual Machine Manager is regarding the ‘over-committed’ status and why it might be displayed. For example, you might see this when attempting to migrate a VM to a particular host, but the UI doesn’t elaborate on why this status was triggered or what you should do about it. In this article we explain the algorithm VMM 2012 uses in the hopes that you’ll have a better understanding of how this status is determined and what you can do if you see it.

Overview of the Approach

The SCVMM 2012 cluster overcommit check attempts to ascertain if there is any possibility that VMs will not be able to be restarted in the event of a simultaneous failure of R nodes, where R is the cluster reserve. The cluster is assumed to be overcommitted until proven otherwise. There are four different approaches tried, and if any one of them can show that the cluster is not overcommitted, then the cluster state is set to “OK”. Otherwise the cluster state is set to “Overcommitted”.

The four approaches can be visualized in a table like this:

image

Proof Method

This method works by measuring whether there are enough VMs to fill up all the hosts to a point where the largest VM will just barely fail to start on any of them. It considers the worst case where the largest VM is the last to be failed over, and again the worst case where every host has 1 byte too little memory to start that VM.

Slot Method

This method works by assigning each VM on a failed host to a single standard size slot equal to the size of the largest VM on all failed hosts. It then counts the number of available slots on each of the other hosts and checks that there are enough free slots to place all the VMs currently on failed hosts.

Simple check

This approach does not consider a specific set of hosts to fail, but rather makes worst case cluster wide assumptions. The largest VM size is chosen as the largest VM in the entire cluster. The failing-over VM sizes are not chosen from a specific set of hosts, but rather simply the theoretical highest sum we can achieve from R failing hosts. Likewise, the amount of memory or slots available on other hosts is the sum across the lowest N-R hosts (where N = cluster size).

Full Complexity Check

This approach iterates over every possible set of R failing hosts. It recalculates the slot size, largest VM size, target host memory sizes and slot count based on each possible combination of failing hosts. The number of sets that has to be considered is Choose(N,R), which can become prohibitively slow for large values of N and R. Because this is roughly proportional to N^R, this check is only run if N^R < 5000. What this means in practical terms, is that the full complexity check is only done in the following cases:

image

It should be noted that the full complexity check is only a marginal refinement over the simple check, falling back on the simple proof check offers very similar results.

Precalculations and Definitions

Cluster Values

image

Host Values

The following values are precalculated for each host. When a value is calculated with respect to LargestClusterVMMB or SlotSizeMB, it is recalculated in each iteration of full complexity checks.

image

NOTES:

1. A 64MB buffer is added to each VM’s memory to account for Hypervisor overhead.

2. Stopped, saved state, paused and running VMs are all counted. A tenant user starting a stopped VM should be accounted for when calculating overcommit.

3. If dynamic memory VMs are present in the cluster, their current memory demand is used.

Algorithms

Slot Simple

– SlotSize = Largest HA VM in the cluster.
– Calculate AvailableSlots, UsedSlots and TotalSlots for each host.
– If Sum(UsedSlots) <= TotalSlotsRemaining, cluster is NOT overcommitted.

Slot full

Iterate over each set of R failing hosts.

– SlotSize = Largest HA VM on the R failing hosts.
– Calculate AvailableSlots, UsedSlots and TotalSlots for each host.
– TotalSlotsRemaining = Sum of TotalSlots on all non-failing hosts.
– If Sum(UsedSlots) > TotalSlotsRemaining, cluster may be overcommitted.
– If Sum(UsedSlots) <= TotalSlotsRemaining for every set of failing hosts, cluster is NOT overcommitted.

Proof Simple

– LargestClusterVM = Largest HA VM in the cluster.
– Calculate AdditionalMemory, HAVMs for all hosts.
– TotalAdditionalSpace = Sum of smallest H values of AdditionalMemory.
– TotalOrphanedVMs = (Sum of largest R values of HAVMs) – LargestClusterVM.
– If TotalOrphanedVMs <= TotalAdditionalSpace, cluster is NOT overcommitted.

Special case: If TotalOrphanedVMs is 0, LargestClusterVM > 0 and TotalAdditionalSpace = 0, then cluster may be overcommitted.

Proof Full

Iterate over each set of R failing hosts.

– LargestClusterVM = Largest HA VM on the R failing hosts.
– Calculate AdditionalMemory, HAVMs for all hosts.
– TotalAdditionalSpace = Sum of AdditionalMemory on non-failing hosts.
– TotalOrphanedVMs = (Sum of HAVMs on the R failing hosts) – LargestClusterVM.
– If TotalOrphanedVMs > TotalAdditionalSpace, cluster may be overcommitted.
– f TotalOrphanedVMs = 0, LargestClusterVM > 0 and TotalAdditionalSpace = 0, cluster may be overcommitted.

If TotalOrphanedVMs < TotalAdditionalSpace for every set of failing hosts, cluster is NOT overcommitted.

Combining the Methods

Note that none of the methods attempt to show overcommitment. They can only show the reverse, that the cluster is not overcommitted. If none of the methods we use can show that we are not overcommitted, we are forced to flag the cluster as overcommitted. If even a single method shows that we are not overcommitted, we can flag the cluster as “OK” and cease calculations immediately.

This is the opposite of the internals for the full complexity analysis. If even a single set of R failing hosts shows that the cluster may be overcommitted, that method is immediately done, having failed to show that the cluster is “OK”.

Example

This example is specifically designed to be a borderline case. Only one method (Proof Full) manages to show that the cluster is not overcommmited.

Cluster has 4x 32GB hosts. Host memory reserve is set to 9GB. 64MB buffer is not added to VM size in this example, just to keep the numbers simpler. Cluster reserve (R) is set to 2.

image

Slot Simple Example

– Slot size = 8GB

image

– TotalSlotsRemaining = 2 smallest values of TotalSlots = (1+3) = 4
– TotalUsedSlots = 7

Since TotalUsedSlots > TotalSlotsRemaining, the method has failed.

Slot Full Example

– TotalUsedSlots = 7, regardless of which hosts fail

image

Since some sets of failing hosts led to TotalUsedSlots > TotalSlotsRemaining, the method has failed.

Proof Simple Example

– LargestClusterVM = 8GB

image

– TotalAdditionalSpace = 2 smallest values of AdditionalMemory = 0GB + 5GB = 5GB.
– TotalOrphanedVMs = (8GB + 8GB) – 8GB = 8GB.

Since TotalOrpanedVMs > TotalAdditionalSpace, the method has failed.

Proof Full Example

image

Since every set of failing hosts led to Orphaned – LargestVM <= AdditionalMemory, the method has succeeded, and the entire cluster can be marked as “OK”.

Hilton Lange | Software Engineer | Microsoft

Get the latest System Center news on Facebook and Twitter:

clip_image001 clip_image002

System Center All Up: http://blogs.technet.com/b/systemcenter/

Configuration Manager Support Team blog: http://blogs.technet.com/configurationmgr/ 
Data Protection Manager Team blog: http://blogs.technet.com/dpm/ 
Orchestrator Support Team blog: http://blogs.technet.com/b/orchestrator/ 
Operations Manager Team blog: http://blogs.technet.com/momteam/ 
Service Manager Team blog: http://blogs.technet.com/b/servicemanager 
Virtual Machine Manager Team blog: http://blogs.technet.com/scvmm

Microsoft Intune: http://blogs.technet.com/b/microsoftintune/
WSUS Support Team blog: http://blogs.technet.com/sus/
The RMS blog: http://blogs.technet.com/b/rms/
App-V Team blog: http://blogs.technet.com/appv/
MED-V Team blog: http://blogs.technet.com/medv/
Server App-V Team blog: http://blogs.technet.com/b/serverappv
The Surface Team blog: http://blogs.technet.com/b/surface/
The Application Proxy blog: http://blogs.technet.com/b/applicationproxyblog/

The Forefront Endpoint Protection blog : http://blogs.technet.com/b/clientsecurity/
The Forefront Identity Manager blog : http://blogs.msdn.com/b/ms-identity-support/
The Forefront TMG blog: http://blogs.technet.com/b/isablog/
The Forefront UAG blog: http://blogs.technet.com/b/edgeaccessblog/