This article updates the previous benchmark for name resolution performance on windows server 2012 R2 published here. It details the name resolution performance of the windows DNS Server 2012 R2, when deployed in a strictly authoritative mode on which recursion has been disabled and root hints have been removed. The new benchmarks have been recorded with Dnsperf tool, which is a popular DNS measurement tool in the industry. Also, this article mentions the new optimizations configured on the underlying operating system to further enhance the performance.
The tests were performed on a physical machine as well as the virtual machine.
The physical machine configurations:
Hyper-V was enabled on the physical machine.
The virtual machine configurations:
The virtual machines were similar to that of a Azure A& class VMs. Details are below
The test setup consisted of the target DNS server and a single Linux client running the dnsperf tool. Dnsperf creates multiple clients all of them using same IP but different port and starts sending queries to DNS Server. It will increase the QPS as long as DNS Server is responding without any drops.
The tests were done by varying the number of zones and records on the DNS server under two different scenarios:
- Positive responses : Here the DNS server, hosting only A records, receives queries with all queries resulting in positive responses.
- 40% Error : Here the DNS server receives queries such that 40% queries result in NX_DOMAIN or SERV_FAIL. In this scenario following QTYPE and RCODE distribution were used
MX (4%), TXT (3%), NS (3%): the rest
Positive responses: 60%
Queries were sent for 10 mins in both scenarios.
Performance Results on Physical machine
- The Authoritative Windows DNS Server, under these tests, is able to respond successfully to 99.99% of queries sent to it, up to a rate of 260K QPS.
- In the tests, we used a single client. The Receive Side Scaling on the Windows Server hashes the incoming packets and if they originate from single source IP address, as in the case of the Dnsperf tool, then it does not distribute them on different cores. If the source IP addresses are randomized, the server cores will be better utilized as the packet processing get schedules on multiple cores as opposed to a single core. Thus, the DNS server gives even better performance (~300K QPS). This was evidenced by running the performance test using a different tool which randomized the source IP address.
- Beyond this rate of QPS, the DNS server continues to respond at higher rates, but there is a drop in the percentage of queries that are responded to by the DNS server.
- The number of records and zones on the authoritative DNS server do not have a considerable impact on the QPS. It must be said that more records put higher memory requirements on the DNS Server. On an average, an Authoritative server with 2M records, needs about 2 GB of memory
Performance Results on Virtual machine
The name resolution performance on the virtual machine matches that of the physical machines and at times the results are even better as there no RSS limitations in this case.
Following server and network parameters were tuned to achieve the best performance. Note that these tunings are optimized for the server and network configuration as described above. The values of these parameters vary with the deployment.
- The recursion is disabled on the server and the root hints have been removed. You can remove root hints by stopping the DNS service, deleting the system32/dns/cache.dns file and restarting the service.
Following firewall rule was enabled explicitly. This rule restricts the conditions to match the firewall rule to the UDP as protocol and local IP/port. Not enabling this rule can cause high CPU usage by the firewall service at higher QPS. The firewall rules already allow port 53 to be open for DNS traffic. This rule does not disable any firewall feature.
New-NetFirewallRule -DisplayName <String> -Direction Inbound -Action Allow -Protocol UDP -LocalPort 53 -LocalOnlyMapping $true -Enabled True
- CPU Cores:
DNS service creates UDP Receive threads based on total logical cores present in system. e.g. for a 64 logical core system DNS service will create 64 UDP receive threads. When the Windows DNS server is deployed on machines with more than 12 total cores (logical / physical) , UDP thread count should be set to 8. This gives the best QPS performance with most optimum utilization of CPU. Following registry key was set for this
*Applying this setting into effect will require a restart of the DNS Service.
- Network adapter receive buffers were set to maximum.
Set-NetAdapterAdvancedProperty -Name <NIC Name> -DisplayName "Receive Buffers" -DisplayValue "Maximum"
- RSS settings optimizations:
RSS optimization and proper NIC configuration is required for getting high QPS. Following are the guidelines for setting RSS configuration for best performance:
- Find out number of Physical cores in NUMA node. (DNS server performance goes down if RSS is on different NUMA and DNS server is on different NUMA node )
- Depending on total physical cores available in NUMA node, configure RSS queues. e.g. if total physical cores are 10, reserve 4 processors for 4 RSS queues. If 6 physical cores are available then configure 2 RSS queues.
- Assign a physical CPU core to each RSS queue. e.g. if there are 4 RSS queues then assign 4 CPUs to them.
- Make sure DNS server is running with 8 threads
- Assign DNS server threads to remaining logical CPUs of same NUMA node where RSS is configured.
RSS applies hash function on incoming packets. So for performance test, instead of generating high load from single machine ensure the load is generated from different machines. Ensure that no CPU cores assigned to RSS is 100% utilized. If one of the CPU starts choking it will make machine unstable. Also, ensure RSS and DNS are running on same NUMA node, and configuration of threads and RSS queue depends on CPU cores in one NUMA node
Following are some RSS settings (based on the guidance above) made on the DNS servers under test
Pin RSS to a same NUMA node with DNS. 4 CPUs for RSS is enough to handle high (>150k QPS) load.
Set-NetAdapterRss -Name $nicName -Profile $rssProfile -MaxProcessors $numberRssProcessors -BaseProcessorNumber $baseRssProcessor
Set the variables according to the below table which has recommended settings for different configurations.
The configuration depends on how many physical CPUs are available in one NUMA node, as RSS can work only on physical cores.
Set-NetAdapterAdvancedProperty Admin -RegistryKeyword *NumRssQueues -RegistryValue $maxRssQueues
8 is a good value for most mid-range systems. Higher end systems may need greater values.
NIC Receive Buffers
Set-NetAdapterAdvancedProperty Admin -RegistryKeyword *ReceiveBuffers -RegistryValue $receiveBuffers
2048 is a good value for most mid-range systems. Higher end systems may need greater values.
* These results also apply to Windows Server 2016 Technical Preview