Network layer tweaks in Windows Server 2008

KB article 951037 describes some of the new features in the OS related to the network layer, some similar to the “Scalable Networking Pack” released for Windows Server 2003 (included in SP2).

Some environments (NICs, switches, routers) do not behave well with these new features and unpredictable symptoms can crop up with no apparent pattern due to this.

The KB article mentioned above has a lot of detail on the options, and I leave it as an exercise to the reader to look up exactly what each feature does, but here is a summary of how to check your current settings and toggle them OFF.

 

NOTE: All netsh commands are executed from an elevated command prompt

To view your current settings:
netsh int tcp show global

This will display something similar to the following:
TCP Global Parameters
----------------------------------------------
Receive-Side Scaling State : enabled
Chimney Offload State : automatic
NetDMA State : enabled
Direct Cache Acess (DCA) : disabled
Receive Window Auto-Tuning Level : normal
Add-On Congestion Control Provider : none
ECN Capability : disabled
RFC 1323 Timestamps : disabled

 

To see the valid settings for each option:
netsh int tcp set global /?

To disable the TCP chimney offloading feature:
netsh int tcp set global chimney=disabled

To disable the Receive Side Scaling (RSS) feature):
netsh int tcp set global rss=disabled

To disable the NetDMA feature you need to edit the registry and reboot:
Path:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters Name:
EnableTCPA Type: REG_DWORD
Data: 0

 

To disable offloading features on the NICs themselves:
You may find offloading features on the properties of the NIC drivers in Device Manager, and as this is determined by the manufacturers there is no standard for naming or number of options, but look for and disable any reference to “offload”.

TCP checksum offloading is a strange beast in itself – I have had customers report better performance when this setting is turned off in the NIC properties, and even had servers stop bugchecking due to NIC driver implementation of it.

The other side effect is that network traces will show “TCP checksum invalid” for all outbound packets, because the NIC is calculating and adding the checksum after the filter driver has “captured” the packet on its way out – this often makes people nervous that they have a hardware problem.

The theory is that giving the checksum calculation to the NIC saves the CPUs some work, but personally I have never seen anything but problems from this feature and never a degradation of performance by turning it off.

 

Now – which to disable, and how to know if it helps?

This is where you have to do some constructive testing – if you are experiencing connectivity problems with a server then you should determine the frequency of the problem, or even better if there is something that makes it predictable/reproducible.

If it is just “slower than expected network throughput” then looking at a network trace for dropped/munged/duplicate packets would be a place to start, as well as seeing what the problem machines have in common (i.e. physical location on the network, is there a router or firewall in common?).

I would strongly recommend noting the current settings before making any changes, and also taking a baseline using a documented test procedure several times (to allow for variance and caching) – then make ONE change from the above options and repeat the test to see if there is a constant, noticeable impact.

For different environments, there may be different “sweet spots” so a combination of enabled & disabled features might need to be tested – there isn’t a silver bullet here, unfortunately.

Also make sure to test from multiple clients, sometimes improving performance or resolving issues for one set of clients can have a negative impact on others, in non-heterogeneous environments especially.

If a change is made and has no effect, I would recommend returning it to its default setting.