In the last couple of months I have seen an upgoing trend of support cases related to performance problems where the root cause was an overcommitted virtualization host system.
What is an oversubscribed virtualization host system?
Virtualization allows to map physical CPU cores installed in the virtualization host to virtual (guest) machines running on this system. E.g. one virtual machine can have 4 cores assigned. Another can have 2 assigned and so on.
What you can end up with is that the number of cores assigned to all running virtual machines is larger than the number of physical cores installed in the virtualization host system - here we are talking about an oversubscribed (or overcommited) CPU in the virtualization hosts.
What is the problem if we oversubscribe our virtualization host system?
In this configuration CPU cores have to be shared by two or more virtual machines running as guest. Each CPU core can only execute a single instruction at each time. That means that the core can only execute instructions from one of these guest systems. The virtual core in the other guest cannot execute instructions at this time. If two virtual machines would like to perform an operations at the same time on virtual cores which are mapped to the same physical core, then first the operation of one virtual machine is executed and afterwards the other. Of course this can significantly slow down the performance of both of these virtual machines.
Another aspect that needs to be taken into consideration is hyper threading. Hyper threading allows to simulate two cores out of a single physical core. For the whole system you can expect a performance benefit out of this (up to 30% more performance as without hyper-threading) but it also means that each of these simulated cores has only approx. 65% of the performance of a physical core without hyper-threading. Hyper threading on the host system can result in a similar effect as oversubscribing the physical cores on the host-system with virtual guest machines.
A problem in troubleshooting this type of issue is that it is nearly impossible to identify it by taking logs on the virtual guest machine. Performance Monitor will not really show an issue, same to ULS logs or other type of logging inside the guest machine. You can only see that on the virtualization host system.
I have seen customers who were adding more and more virtual machines to their virtualization host system and oversubscribing it without considering the effect on the other virtual machines running on the same host.
Guidance for SharePoint
For SharePoint we have a clear guidance related to overcommitting/oversubscribing CPUs in the following article:
- Best Practices for Virtualizing and Managing SharePoint 2013 - (the same applies also to all other SharePoint versions)
"Do not oversubscribe the CPU on the virtualization host computer as it can decrease performance. For any virtual machine that you use in a SharePoint 2013 farm, use a virtual processor:logical processor ratio of 1:1 for optimum performance."
Be aware that this article talks about real physical cores - not the number of cores if hyper threading is enabled.