Virtual Machine Queue (VMQ) CPU assignment tips and tricks

Hello

First a small introduction. I am Marco Cancillo and have been working at Microsoft since 2013 as a Sr. Support Engineer in the western Europe CTS Network team.
Within my work I have seen challenges on how to use set-netadaptervmq and achieve an optimal configuration.

set-netadaptervmqcmdlet sets the virtual machine queue (VMQ) properties of the network adapter. VMQ is a scaling networking technology for Hyper-V switch that improves network throughput by distributing processing of network traffic for multiple virtual machines (VMs) among multiple processors. A thorough familiarity with VMQ and dynamic VMQ is highly recommended before changing any default values with this cmdlet.

The goal of this article is to assist on how to use set-netadaptervmq and in my experiencing visualizing helps a lot.

One of the challenges is that there is no single solution for this. It heavily depends on the following variables

  • Amount of CPU Sockets used in the server
  • Amount of Physical Cores per CPU socket
  • Amount of NICs in the LBFO team(s). Depending on the LBFO mode
  • Amount of VMQ Queues provided per interface
  • LBFO mode. In the examples I will use Switch Independent Dynamic or Hyper-V port mode as these are the most common and provide the largest set of VMQ Queues
  • What is the LBFO team servicing. Is the team servicing everything or are there multiple teams with different roles like VMguest traffic and management tasks (Management/ Live Migration / Cluster)

When using set-netadaptervmq there are some guidelines you have to take into consideration

  • Do not use CPU 0
  • Only physical cores can be used
  • Do not span NUMA node
  • Stay below 64 Logical CPUs

Plus be aware that

  • a VMQ Queue can only use 1 CPU core
  • multiple VMQ queues can use the same CPU core
  • the algorithm only starts using the next CPU in the assigned range when it exceeds a 90% utilization

When you take these into account you have a certain freedom on how you can to distribute the resources.

To make it more tangible I created 2 examples, using 6 Core CPU’s with Hyper Threading enabled but the same logic applies if you have more or less core available.

Example 1

Customer has a Hyper-V cluster running server 2012R2

Each server has the following hardware

  • 2 CPU’s with 6 Physical and 12 Logical Cores
  • 2 10Gbe NICs from HP with 28 VMQ queues per interface

Teaming configuration (with a switch bound to it)

  • TEAM_HV
  • LBFO team configured in Switch Independent Dynamic Port mode
  • NICS CNA_C1 and NICS CNA_C2 are used in this team

The following configuration was active

Name        InterfaceDescription              Enabled BaseVmqProcessor MaxProcessors NumberOfReceiveQueues          
—-        ——————–              ——- —————- ————- —————-
CNA_D2_CSV    HP FlexFabric 20Gb 2-port 65…#6 False   0:16             2             0              
CNA_D1_LM    HP FlexFabric 20Gb 2-port 65…#5 False   0:12             2             0              
CNA_C2        HP FlexFabric 20Gb 2-port 65…#4 True    0:8              8             28             
CNA_C1        HP FlexFabric 20Gb 2-port 65…#3 True    0:0              8             28             
CNA_A2        HP FlexFabric 20Gb 2-port 65…#2 False   0:20             2             0              
CNA_A1        HP FlexFabric 20Gb 2-port 650F… False   0:20             2             0              
TEAM_HV        Microsoft Network Adapter Mu…#2 True    0:0                            56             
TEAM_Mgmt    Microsoft Network Adapter Mult… False   0:0                            0              

When plotting this configuration on in a table it looks like this

1

Which shows an overlap and NUMA spanning which will result in event ID 106.

What you want is to spread the load over the CPU cores and offload CPU 0:0 for example like this

2

The commands to achieve this is the following

Set-NetAdapterVmq -Name "CNA_C1" -BaseProcessorNumber 2 -MaxProcessors 5 -MaxProcessorNumber 10
Set-NetAdapterVmq -Name "CNA_C2" -BaseProcessorNumber 14 -MaxProcessors 5 -MaxProcessorNumber 22

Example 2

Customer has a Hyper-V cluster running server 2012R2

Each server has the following hardware

  • 2 CPU’s with 6 Physical and 12 Logical Cores
  • 2 10Gbe NICs from vendor XYZ with 32 VMQ queues per interface
  • 2 10Gbe NICs with 8 VMQ queues per interface

Teaming configuration (with a switch bound to it)

  • GuestTrafficLBFO
  • LBFO team configured in Switch Independent Dynamic Port mode
  • The 2 XYZ NICs are used in this team named CNA1 & CNA2
  • ManagementLBFO

  • LBFO team configured in Switch Independent Dynamic Port mode
  • The 2 XYZ NICs are used in this team named CNA3 & CNA4

3

The commands to achieve this is the following

Set-NetAdapterVmq -Name "CNA1" -BaseProcessorNumber 2 -MaxProcessors 4 -MaxProcessorNumber 8

Set-NetAdapterVmq -Name "CNA2" -BaseProcessorNumber 14 -MaxProcessors 4 -MaxProcessorNumber 20

Set-NetAdapterVmq -Name "CNA3" -BaseProcessorNumber 10 -MaxProcessors 1 -MaxProcessorNumber 10

Set-NetAdapterVmq -Name "CNA4" -BaseProcessorNumber 22 -MaxProcessors 1 -MaxProcessorNumber 22

Keep in mind that this is just an example. If you expect more network load on the management switch/LBFO team then you could assign more CPU’s to it.

Other links:

VMQ Deepdive series

Event ID 106 when a Hyper-V virtual switch is bound to an LBFO team
https://support.microsoft.com/en-us/kb/2974384

Until next time!

-Marco