Vista Multimedia Playback and Network Throughput

A few weeks ago a poster with the handle dloneranger reported in the 2CPU forums that he experienced reduced network throughput on his Vista system when he played audio or video. Other posters chimed in with similar results, and in the last week attention has been drawn to the behavior by other sites, including Slashdot and Zdnet blogger Adrian Kingsley-Hughes.

Many people have correctly surmised that the degradation in network performance during multimedia playback is directly connected with mechanisms employed by the Multimedia Class Scheduler Service (MMCSS), a feature new to Windows Vista that I covered in my three-part TechNet Magazine article series on Windows Vista kernel changes. Multimedia playback requires a constant rate of media streaming, and playback will glitch or sputter if its requirements aren’t met. The MMCSS service runs in the generic service hosting process Svchost.exe, where it automatically prioritizes the playback of video and audio in order to prevent other tasks from interfering with the CPU usage of the playback software:

When a multimedia application begins playback, the multimedia APIs it uses call the MMCSS service to boost the priority of the playback thread into the realtime range, which covers priorities 16-31, for up to 8ms of every 10ms interval of the time, depending on how much CPU the playback thread requires. Because other threads run at priorities in the dynamic priority range below 15, even very CPU intensive applications won’t interfere with the playback.

You can see the boost by playing an audio or video clip in Windows Media Player (WMP), running the Reliability and Performance Monitor (Start->Run->Perfmon), selecting the Performance Monitor item, and adding the Priority Current value for all the Wmplayer threads in the Thread object. Set the graph scale to 31 (the highest priority value on Windows) and you’ll easily spot the boosted thread, shown here running at priority 21:

Besides activity by other threads, media playback can also be affected by network activity. When a network packet arrives at system, it triggers a CPU interrupt, which causes the device driver for the device at which the packet arrived to execute an Interrupt Service Routine (ISR). Other device interrupts are blocked while ISRs run, so ISRs typically do some device book-keeping and then perform the more lengthy transfer of data to or from their device in a Deferred Procedure Call (DPC) that runs with device interrupts enabled. While DPCs execute with interrupts enabled, they take precedence over all thread execution, regardless of priority, on the processor on which they run, and can therefore impede media playback threads.

Network DPC receive processing is among the most expensive, because it includes handing packets to the TCP/IP driver, which can result in lengthy computation. The TCP/IP driver verifies each packet, determines the packet’s protocol, updates the connection state, finds the receiving application, and copies the received data into the application’s buffers. This Process Explorer screenshot shows how CPU usage for DPCs rose dramatically when I copied a large file from another system:

Tests of MMCSS during Vista development showed that, even with thread-priority boosting, heavy network traffic can cause enough long-running DPCs to prevent playback threads from keeping up with their media streaming requirements, resulting in glitching. MMCSS’ glitch-resistant mechanisms were therefore extended to include throttling of network activity. It does so by issuing a command to the NDIS device driver, which is the driver that gives packets received by network adapter drivers to the TCP/IP driver, that causes NDIS to “indicate”, or pass along, at most 10 packets per millisecond (10,000 packets per second).

Because the standard Ethernet frame size is about 1500 bytes, a limit of 10,000 packets per second equals a maximum throughput of roughly 15MB/s. 100Mb networks can handle at most 12MB/s, so if your system is on a 100Mb network, you typically won’t see any slowdown. However, if you have a 1Gb network infrastructure and both the sending system and your Vista receiving system have 1Gb network adapters, you’ll see throughput drop to roughly 15%.

Further, there’s an unfortunate bug in the NDIS throttling code that magnifies throttling if you have multiple NICs. If you have a system with both wireless and wired adapters, for instance, NDIS will process at most 8000 packets per second, and with three adapters it will process a maximum of 6000 packets per second. 6000 packets per second equals 9MB/s, a limit that’s visible even on 100Mb networks.

I caused throttling to be visible on my laptop, which has three adapters, by copying a large file to it from another system and then starting WMP and playing a song. The Task Manager screenshot below shows how the copy achieves a throughput of about 20%, but drops to around 6% on my 1Gb network after I start playing a song:

You can monitor the number of receive packets NDIS processes by adding the “packets received per second” counter in the Network object to the Performance Monitor view. Below, you can see the packet receive rate change as I ran the experiment. The number of packets NDIS processed didn’t realize the theoretical throttling maximum of 6,000, probably due to handshaking with the remote system.

Despite even this level of throttling, Internet traffic, even on the best broadband connection, won’t be affected. That’s because the multiplicity of intermediate connections between your system and another one on the Internet fragments packets and slows down packet travel, and therefore reduces the rate at which systems transfer data.

The throttling rate Vista uses was derived from experiments that reliably achieved glitch-resistant playback on systems with one CPU on 100Mb networks with high packet receive rates. The hard-coded limit was short-sighted with respect to today’s systems that have faster CPUs, multiple cores and Gigabit networks, and in addition to fixing the bug that affects throttling on multi-adapter systems, the networking team is actively working with the MMCSS team on a fix that allows for not so dramatically penalizing network traffic, while still delivering a glitch-resistant experience.

Stay tuned to my blog for more information.