Announcing: New Transport Advancements in the Anniversary Update for Windows 10 and Windows Server 2016

Christian Huitema, Daniel Havey, Matt Olson, Osman Ertugay, Praveen Balasubramanian

TCP based communication is used ubiquitously in devices from IoT to cloud servers. Performance improvements in TCP benefit almost every networking workload. The Data Transports and Security (DTS) team in Windows and Devices Group is committed to making Windows TCP best in class. This document will describe the first wave of features in the pipeline of upcoming Windows Redstone releases.

Windows is introducing new TCP features in the Anniversary Update for Windows 10 and Windows Server 2016 releasing summer 2016. In this document we will describe five key features designed to reduce latency, improve loss resiliency and to promote better network citizenship. The goals when starting out were to decrease TCP connection setup time, increase TCP startup speed and to decrease time to recover from packet loss.

Here is a summary of the feature list:

  1. TCP Fast Open (TFO) for zero RTT TCP connection setup. IETF RFC 7413 [1]
  2. Initial Congestion Window 10 (ICW10) by default for faster TCP slow start [5]
  3. TCP Recent ACKnowledgment (RACK) for better loss recovery (experimental IETF draft) [4]
  4. Tail Loss Probe (TLP) for better Retransmit TimeOut response (experimental IETF draft) [3]
  5. TCP LEDBAT for background connections IETF RFC 6817 [2]

 

TCP Fast Open: TCP Fast Open (TFO) accomplishes zero RTT connection setup time by generating a TFO cookie during the first three-way handshake (3WH) connection setup. Subsequent connections to the same server can use the TFO cookie to connect in zero-RTT. TFO connection setup really just means that TCP can carry data in the SYN and SYN-ACK. This data can be consumed by the receiving host during the initial connection handshake. TFO is one full Round Trip Time (RTT) faster than the standard TCP setup which requires a three way-handshake. This leads to latency savings and is very relevant to short web transfers over the Internet where the average latency is on the order of 40 msec.

Transport Layer Security (TLS) over TCP using Fast Open is typically two Round Trip Times faster than a standard TLS over TCP connection setup because a client_hello can be included in the SYN packet saving an additional RTT in the TLS handshake. This savings can add up to a substantial increase in resource efficiency while using busy servers that deliver many small Internet objects to the same clients (standard web page, mobile APP data, etc.) TLS 1.3 is an ongoing effort at the IETF and it will help us achieve zero-RTT connection setup for HTTP workloads in a subsequent release.

Because we are changing the 3WH behavior of TCP there are several issues that we must address and mitigate. Windows recommends that TLS be used over TCP when employing TCP Fast Open to remove the chance that a man in the middle could manipulate TFO cookies for use in amplified DDOS attacks. TLS connections are immune to attacks from behind Shared Public IPs (NATs), however, it is still possible for a compromised host to flood spoofed SYN packets with valid cookies. To address the problem of compromised hosts Windows TFO sets a dynamically adjusted maximum limit on the number of pending TFO connection requests preventing resource exhaustion attacks from compromised hosts running malicious code. Finally, it is possible for the SYN packet to be duplicated in the network. TLS precludes such duplicate delivery but other services need to ensure that TFO is used for idempotent requests. Windows TFO is safe when used as recommended (with TLS) and can provide a substantial increase in resource efficiency.

The Anniversary Update for Windows 10 will ship with a fully compliant client side implementation enabled by default. The Microsoft Edge browser will ship with a About:Flags setting for TCP Fast Open which will be disabled by default. The eventual goal is to have it enabled by default in IE and Edge browsers in a subsequent release. In a subsequent release we plan to support early accept and to fully integrate the server side implementation with http.sys/IIS. The server side implementation will be disabled by default.

Configuration: In the Edge browser, navigate to “about:flags” or “about:config” and use checkbox for “Enable TCP Fast Open”, Netsh int tcp set global fastopen=<enabled | disabled>

Action Items: If you operate infrastructure or own software components like middleboxes or packet processing engines that make use of a TCP state machine, please begin looking into supporting RFC 7413. By next year the combination of TLS 1.3 and TFO is expected to be more widespread.  Read more at: Building a faster and more secure web with TCP Fast Open, TLS False Start, and TLS 1.3

 

Initial Congestion Window (IW10): The Initial Congestion Window (IW or ICW) default value in Windows 10 and Server 2012 R2 is 4 MSS. With the new releases the default value will be 10 MSS. IW10 default improves slow start speed over the previous default value of IW4. This change in Windows TCP’s startup behavior designed to keep pace with the increased emission rates of network routing equipment used on the Internet today. The ICW determines the limit on how much data can be sent in the first RTT. Like Windows TFO, IW10 mostly affects small object transfers over the Internet. Windows IW10 can transfer small Internet objects up to twice as quickly as ICW4.

There are some concerns around burst losses with switches and routers that have shallow buffers. We have telemetered such episodes to help us improve the reliability in subsequent releases. In the next Windows Client release, we plan to flight IW 4, IW 10 and IW 16 to have a better performance comparison across device types.

Configuration: This is currently configured through templates (netsh) or set-nettcpsetting (Powershell). On client SKU the only options to change the IW are to switch to the compat template (IW = 4) or to use the SIO_TCP_SET_ICW option, which also restricts the values in range (2, 4, 10). On server SKU IW can be configured up to a maximum of 64.

Action Items: Please notify us if you see increased loss rates or timeouts with Windows clients and servers.

 

Tail Loss Probe (TLP): Tail Loss Probe is intended to improve Windows TCP’s behavior when recovering from packet loss. TLP improves TCP recovery behavior by converting Retransmit TimeOuts (RTOs) into Fast Retransmits for much faster recovery.

TLP transmits one packet in two round-trips when a connection has outstanding data and is not receiving any ACKs. The transmitted packet (the loss probe), can be either new or a retransmission. When there is tail loss, the ACK from a loss probe triggers SACK/FACK based fast recovery, thus avoiding a costly retransmission timeout (which is bad from the point of view of the long duration as well as the reduction of the congestion window and repeat of slow start).

TLP is enabled only for connections that have an RTT of at least 10 msec in both Windows Client and Server 2016. This is to avoid spurious retransmissions for low latency connections. The most beneficial scenario for TLP is short web transfers over WAN.

Configuration: The TCP templates have the additional setting called “taillossprobe”. On client SKU switching to compat template turns TLP off. On both client and server SKUs, the Internet template has it enabled by default. The InternetCustom and DatacenterCustom templates can be used for more fine grained control for specific connections.

 

Recent ACKnowledgement (RACK): RACK uses the notion of time instead of counting duplicate ACKnowledgements to detect missing packets for TCP Fast Recovery. RACK provides improved loss detection over standard TCP Fast Recovery techniques.

RACK is based on the notion of time, instead of traditional approaches for packet loss detection such as packet or sequence number checks. Packets are deemed lost if a packet that was sent “sufficiently later” has been cumulatively or selectively acknowledged. The TCP sender records packet transmission times and infers losses using cumulative or selective acknowledgements.

RACK is enabled only for connections that have an RTT of at least 10 msec in both Windows Client and Server 2016. This is to avoid spurious retransmissions for low latency connections. RACK is also only enabled for connections that successfully negotiate SACK.

Configuration: The TCP templates have the additional setting called “rack”. On client SKU switching to compat template turns RACK off. On both client and server SKUs, the Internet template has it enabled by default. The InternetCustom and DatacenterCustom templates can be used for more fine grained control for specific connections.

 

Windows Low Extra Delay BAckground Transport (LEDBAT): The fifth feature is in response to a large number of customer requests for a background transport that does not interfere with other TCP connections. In response to these requests we used Windows TCP modular congestion control structure and added a new Congestion Control Module called LEDBAT in order to manage background flows.

Windows LEDBAT is implemented as an experimental Windows TCP Congestion Control Module (CCM). Windows LEDBAT transfers data in the background and does not interfere with other TCP connections. LEDBAT does this by only consuming unused bandwidth. When LEDBAT detects increased latency that indicates other TCP connections are consuming bandwidth it reduces its own consumption to prevent interference. When the latency decreases again LEDBAT ramps up and consumes the unused bandwidth.

Configuration: LEDBAT is only exposed through an undocumented socket option at the moment. Please contact us if you would like to enable experimentation for a background workload.

 

Works Cited:

[1] Y. Cheng et al, “RFC: 7413: TCP Fast Open,” December 2014. [Online]. Available: https://tools.ietf.org/html/rfc7413

[2] S. Shalunov et al, “RFC 6817 Low Extra Delay Background Transport (LEDBAT),” December 2012. [Online]. Available: https://tools.ietf.org/html/rfc6817

[3] N. Dukkipati et al, “Tail Loss Probe (TLP): An Algorithm for Fast Recovery of Tail Losses,” February 2013. [Online]. Available: https://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01

[4] Y. Cheng et al, ” RACK: a time-based fast loss detection algorithm for TCP,” October 2015. [Online]. Available: https://www.ietf.org/archive/id/draft-cheng-tcpm-rack-00.txt

[5] J. Chu et al, “RFC 6928 Increasing TCP’s Initial Window,” April 2013. [Online]. Available: https://tools.ietf.org/html/rfc6928