In my last posts about VMQ (virtual machine queues), VMQ Deep Dive, 1 of 3, I tried to give readers a thorough understanding on the technology and why it was needed. Unfortunately, I still feel like there’s a piece missing around troubleshooting suspected VMQ issues. Therefore, in this blog post I’d like to address these questions and help the community debug issues with step-by-step guidance.
The list of questions I’ve seen vary but almost always boil down to a variation of two questions:
1. Is VMQ configured correctly?
2. Why is throughput lower when we use VMQ?
3. What network performance should I expect to see in a guest VM, and how should I validate/test that?
So without further ado, let’s jump right in.
The first step to take when debugging these issues is to make sure your NIC is VMQ capable and that VMQs are allocated. This is easy to do with two PowerShell cmdlets, Get-NetAdapterVmq and Get-NetAdapterVMQQueue. Examples of each of these cmdlets and output are below.
With Get-NetAdapterVmq, you can easily tell if your NIC is VMQ capable because the NIC will show up in the output if it is. If it does not, this NIC is not surfacing VMQ capabilities.
Next, the Get-NetAdapterVmqQueue cmdlet lists the VMQs that are allocated on the NIC. In the picture above there are 3 queues allocated: one for the default queue, one for the host OS and one for a VM on the system. VMQ does it’s filtering on the destination MAC/VLAN so these are the two most important fields to note. The default queue is used to catch all traffic that does not match a VMQ filter so it easy to spot because the MAC field is left empty. The host OS has the same friendly name as the default queue, which is the name host OS and the corresponding VM name is given to all other queues.
If your NIC is VMQ capable but a VMQ is not allocated for a VM or your host, you can check the System Event log. A few common reasons we find are:
- The OID failed because of a lack of available resources on the NIC
- NIC teaming is in Min-Mode and the processor set is not identical
- Alternatively, NIC teaming is in sum-mode and the processor sets are overlapping
- Policy settings prevent VMQ from allocating (the NIC must be in promiscuous mode)
If you run into issue 1, it is most likely because you have exhausted the number of VMQs on the NIC. In this case, you will need to do one of two things:
- Disable VMQ from idle or lower priority VMs to make queues available for new VMs
- Set a higher weight to the VMs that you consider most important using the cmdlet, “Set-VMNetworkAdapter –Name <NIC> –Vmqweight <0-100> cmdlet.” This cmdlet will give VMQ allocation preference to the VMs with the highest weight.
Issue 2 has to do with the coexistence of VMQ and NIC teaming. This issue has to do with how the processors are configured for each NIC in the team. I will not go into detail on this error in this guide as it is already very well defined in our NIC teaming guide. The guide can be found here:
The third issue has to do with mode that your NIC is in. If promiscuous mode is disable, VMQ will also be disabled. Promiscuous mode must be enabled for VMQ to work properly.
We can move on once we’ve verified that VMQ is correctly configured.
Lower than expected performance
The next issue I want to address manifests itself when VMQ seems to be configured correctly, but the throughput on your system is much lower than what you expected. This makes up 80% of the problems we see with VMQ. Before we tackle causes of this issue, I want to point out that VMQ is limited to the processing power of one CPU for each vmNIC and it scales linearly as you add VMs. I’ve found that many issues have been resolved by design because people don’t quite understand this when first using VMQs.
Moving on, if you find that throughput to a VM is not what you expected, the first thing to do is check your host’s task manager to get an idea of the CPU utilization on your system. Below is an example:
In this case we can clearly see that CPU8 is bottlenecked. I’ve re-attached the output of our cmdlet below so we can troubleshoot this behavior and make sure this is in fact the intended behavior. The Processor parameter can be read as Group:Processor. In the output below, 0:8 translate to, “vm513 is being processed on processor 8 in group 0,” and 0:0 will translate to, “the default queue is being processed on processor 0 in group 0.”
Looking at this output from our Get-NetAdapterVmqQueue cmdlet we can see that this is the correct processor for our VM, vm513, and is in fact the expected behavior. You can expect anywhere from 3.5 to 4.5Gbps from a single processor but it will vary based on the workload and CPU.
In the case of two VMs, Dynamic VMQ will do its best to move two network intensive VMQs away from each other and place them on their own processors. Using the same Get-NetAdapterVmqQueue output from above, you can see that both the host OS and the VM are affinitized to processor 8. If the VMQ algorithm finds that this CPU is too heavily utilized the algorithm triggers a VMQ move and affinitizes one of the VMQs to another processor. The trigger for a VMQ move is CPU utilization over 90%. A quick summary: if our algorithm finds that two VMQs are located on the same processor and the CPU utilization exceeds 90%, it will change one VMQ’s affinity to another processor so that they each get individual processors for network traffic processing. One caveat is if all processors are already at 90%. In this case, the system is CPU bottlenecked and the VMQ algorithm does not have any open processors to move queues to so the only option is to leave the system as is.
Low throughput can also be a sign of bugs in the NIC drivers or the implementation of VMQ itself. Once you confirm that there are no bottlenecks in the system, the next step is to confirm that the right processors are being interrupted. To do this you can use Performance Monitor (perfmon). First, run the Get-NetAdapterVmqQueue cmdlet and check the Processor parameter. I circled the parameters you want to pay attention to.
Now that you know the processors you should be expecting traffic on, you can open perfmon and add the following counter:
Hyper-V Hypervisor Logical Processor – Hardware Interrupts/sec
After adding the counter, as a personal preference, I change my graph type to report so that I can see the number of interrupts in numerical form instead of as a graph. My output is below:
You’ll see a few random interrupts on various processors, that’s normal, but you will want to verify that the majority of interrupts are being processed on your VMQ processors. If you see the majority of interrupts being processed on non-VMQ procs, this could mean you’re running into a driver or OS bug. If this is the case, make sure you are running the latest Windows Server updates on your Hyper-v Hostand the latest NIC driver (and possibly an accompanying firmware) before contacting the manufacturer or us directly.
One last check for low throughput is with the miniport. The miniport itself might be out of resources, free NBLs from its buffer pool, and in that case, NBLs are tagged with low resources flag. This means that the miniport must wait for NBLs to be freed before indicating more packets to the stack, slowing overall system performance. These counters can also be surfaced through perfmon:
TCPIP Performance Diagnostics: IPv4/6 NBLs indicated with low resources flag
TCPIP Performance Diagnostics: IPv4/6 NBLs/sec indicated with low resources flag
This problem can be addressed by increasing the send and receive buffer sizes in the advanced properties of your network adapter. You can reach this interface by opening your control panel and navigating to ‘Network and Internet’ then ‘Network Connections.’ On this page find the adapter you’re using for VMQ and right-click then click on ‘Properties.’ Finally, click on the Configure box and the GUI shown below should display.
Find the Receive Buffers and Send Buffers field and manually increase them to their max values. This will increase the amount of resources available to your network adapter.
When using VMQ, you must always keep in mind that your VM is always to going to be limited to at most one core. As I mention earlier in this blog, that means that you can realistically expect performance of 3.5-4.5Gbps depending on the speed of a core.
To test this in house, we use an application called NTttcp. The application as well as detailed instructions on how to use it can be found here, NTttcp Version 5.28. Our setups include 2 machines, a sender and a receiver, both running Windows Server 2012 R2. These machines are connected back-to-back with 10Gbps NICs. Although this may not be a realistic setup for the real world, it helps us isolate problems early and eliminates any issues that intermediaries may cause.
On the receiver we install Hyper-V and create a VM. Once the VM is created, we configure 192.168.1.3 as the IP address on the receiving interface. Next, we install the NTttcp receiver inside of the VM and ready NTttcp to receive traffic with the following command:
NTttcp.exe –r –m 16,*,192.168.1.3 –t 30
On the sender, we set the IP address to 192.168.1.2 and use the following command in a command prompt to send traffic:
NTttcp.exe –s –m 16,*,192.168.1.3 –t 30
*By default, Windows firewall may block NTttcp traffic. You may have to modify existing rules to allow the NTttcp traffic to pass through.
Once a connection is made from the sender to the receiver, you will see traffic start to be indicated on the task manager of the receiving machine. When running high level performance tests like this, the task manger is usually good enough to spot CPU bottlenecks and achieved throughput but, when running detailed performance tests, you will want to look into using a more precise tool like Performance Monitor.
In summary, VMQ issues can be tough to find if you don’t know where to look and will typically result in wrong configuration or throughput deterioration. I hope this guide helps guide you in finding issues that may have previously been difficult to see.
Gabriel Silva, Program Manager, Datacenter Networking Platform Team