NUMA - Understand it & its usefulness with Windows Server 2012

 

I have mostly got puzzled look whenever I spoke about NUMA support on Hyper-V so thought of penning the concept & benefits down. As the high end servers & blades are available to further consolidate datacenters, its critical to understand how NUMA can help you to take full advantage of those pricey hardware !

Before we get into the very topic, lets take a step back and understand what is NUMA. 

What is NUMA ?

Non-uniform memory access - of course you know what the acronym stands for :) Lets understand where & how it makes difference to performance of workloads.

Non-uniform memory access (NUMA) is used to increase processor speed without increasing the load on the processor bus. The architecture is non-uniform because each processor is close to some parts of memory and farther from other parts of memory. The processor quickly gains access to the memory it is close to, while it can take longer to gain access to memory that is farther away.

In a NUMA system, CPUs are arranged in smaller systems called nodes. Each node has its own processors and memory, and is connected to the larger system through a cache-coherent interconnect bus.

On NUMA hardware, some regions of memory are on physically different buses from other regions. Because NUMA uses local and foreign memory, it will take longer to access some regions of memory than others. Local memory and foreign memory are typically used in reference to a currently running thread. Local memory is the memory that is on the same node as the CPU currently running the thread. Any memory that does not belong to the node on which the thread is currently running is foreign. Foreign memory is also known as remote memory. The ratio of the cost to access foreign memory over that for local memory is called the NUMA ratio. If the NUMA ratio is 1, it is symmetric multiprocessing (SMP). The greater the ratio, the more it costs to access the memory of other nodes. Windows applications that are not NUMA aware (including SQL Server 2000 SP3 and earlier) sometimes perform poorly on NUMA hardware.

Note - The main benefit of NUMA is scalability. The NUMA architecture was designed to surpass the scalability limits of the SMP architecture. With SMP, all memory access is posted to the same shared memory bus. This works fine for a relatively small number of CPUs, but not when you have dozens, even hundreds, of CPUs competing for access to the shared memory bus. NUMA alleviates these bottlenecks by limiting the number of CPUs on any one memory bus and connecting the various nodes by means of a high speed interconnection.

NUMA node with 4 processors

The system attempts to improve performance by scheduling threads on processors that are in the same node as the memory being used. It attempts to satisfy memory-allocation requests from within the node, but will allocate memory from other nodes if necessary. It also provides an API to make the topology of the system available to applications. You can improve the performance of your applications  by using the NUMA functions to optimize scheduling and memory usage.

NUMA features in Hyper-V on Windows Server 2012

Hyper-V in Windows Server 2012 supports running on a host system with up to 320 logical processors. The number of virtual processors that can be configured in a virtual machine depends on the number of processors on the physical computers. For example, to configure a virtual machine with the maximum of 64 virtual processors, you must be running Hyper-V on a virtualization host that has 64 or more logical processors. In order to support this scalability, Hyper-V in Windows Server 2012 provides virtual NUMA, a synthetic NUMA-like environment for virtual machines. Virtual processors and guest memory are grouped into virtual NUMA nodes, and the virtual machine presents a topology to the guest operating system based on the underlying physical topology.

By default, when a virtual machine is created, Hyper-V examines the underlying physical topology and automatically configures the virtual NUMA topology with optimal settings, based on a number of factors, which include: the number of logical processors and the amount of memory per NUMA node.

Virtual NUMA enables the deployment of larger and more mission-critical workloads that can be run without significant performance degradation in a virtualized environment, when compared to running non-virtualized computers with physical NUMA hardware. When a new virtual machine is created, by default Hyper-V uses values for the guest settings that are in sync with the Hyper-V host NUMA topology. For example, if a host has 16 cores and 64 GB divided evenly between two NUMA nodes with two NUMA nodes per physical processor socket, then a virtual machine that is created on the host with 16 virtual processors will have the maximum number of processors per node setting set to eight, maximum nodes per socket set to two, and maximum memory per node set to 32 GB.

In addition, NUMA spanning can be enabled or disabled. With spanning enabled, individual virtual NUMA nodes can allocate non-local memory, and an administrator can deploy a virtual machine that has more virtual processors per virtual NUMA node than the number of processors that are available on the underlying hardware NUMA node on the Hyper-V host. NUMA spanning for a virtual machine does incur a performance cost because virtual machines access memory on non-local NUMA nodes.

For information about how to configure virtual NUMA, see How to Configure Virtual NUMA for VMM.

Do let me know if you find it useful or have any further questions.

Stay tuned !