By Jason Messer, Senior Program Manager
Using VXLAN for Encapsulation and OVSDB for Policy Distribution
Windows Server 2016 is the perfect platform for building your Software-Defined Data Center (SDDC) with new layers of security and Azure-inspired innovation for hosting business applications and infrastructure. A critical piece of this SDDC is the new Software Defined Network (SDN) Stack which provides agility, dynamic security, and hybrid flexibility by enforcing network policy in the Hyper-V Virtual Switch using the Azure Virtual Filtering Platform (VFP) Switch Extension. Instead of programming network configurations into a physical switch using CLI, NetConf, or OpenFlow, the network policy is instead delivered from the new Microsoft Network Controller to the Hyper-V Hosts using the OVSDB protocol and programmed into the VFP extension of the vSwitch by a Host Agent which enforces the policy. By creating overlay virtual networks (VXLAN Tunnels / logical switches) and endpoints which terminate in the vSwitch, each Hyper-V host becomes a software VXLAN Tunnel End Point (VTEP).
Note: This will be a technical post focusing on networking protocols and some implementation details
Overlays, VXLAN, virtual networking, HNV, encapsulation, NVGRE, logical switch… why should you care about all these esoteric networking terms? Maybe you have heard hard-core networking types mention these in passing or have customers asking how Microsoft’s network virtualization solution compares with other solutions. Why should you care? Because just as compute and storage have been virtualized, traditional networking devices and services are also being virtualized for greater flexibility.
Server hardware is now virtualized through software to mimic CPUs, memory, and disks to create virtual machines. Network hardware is also being virtualized to mimic switches, routers, firewalls, gateways, and load balancers to create virtual networks. Not only do virtual networks provide isolation between workloads, tenants, and business units but they also allow IT and network administrators to configure networks and define policy with agility while realizing increased flexibility in where VMs are deployed and workloads run.
Virtual networks still require physical hardware and IP networks to connect servers and VMs together. However, the packets transmitted between VMs across these physical links are encapsulated within physical network IP packets to create an overlay network – Reference Figure 1. This means that the original packet from the VM with its MAC and IP addresses, TCP/UDP ports, and data remains unchanged and is simply placed inside of an IP packet on the physical network. The physical network underneath is then known as the underlay or transport network – traditionally Microsoft has called this the HNV Provider (or PA) network.
Figure 1 – VXLAN Encapsulation
The idea of network virtualization to guarantee isolation has been around for some time in the form of VLANs. VLANs allow network traffic to be “tagged” with an identifier to create logical network segments and segregate network traffic into broadcast and isolation domains. However, VLANs are largely static configurations programmed on individual switch ports and network adapters. Anytime a server or VM moves, the VLAN configuration must be updated (sometimes in multiple places) and the IP addresses of that workload or VM may need to be changed as well. Moreover, since VLANs use a 12-bit field for the network identifier, there is a limit of 4096 logical network segments which can be created.
Hyper-V Network Virtualization (HNV)
The network virtualization solution in Windows Server 2012 and 2012R2 – Hyper-V Network Virtualization (HNV) – used an encapsulation format known as NVGRE (RFC 7637) to create overlay networks based on network policy managed through SCVMM and programmed through WMI / PowerShell. Another popular industry protocol for encapsulation is VXLAN (RFC 7348) including guidance on how to distribute or exchange VXLAN Tunnel Endpoint (VTEP) information between virtual and physical devices – e.g. Hardware VTEPs.
Note: The HNV solution in Windows Server 2012 and 2012R2 which used NVGRE for encapsulation and WMI/PowerShell for management is still available in Windows Server 2016. We strongly recommend customers move to Windows Server 2016 and the new SDN stack, as the bulk of development and innovation will occur on this stack as opposed to HNVv1.
In talking with customers, we observed some confusion around which encapsulation protocol to use (VXLAN or NVGRE), which was taking focus away from the higher-level value of network virtualization. Consequently, in Windows Server 2016 (WS2016), we support both NVGRE and VXLAN encapsulation protocols, with the default being VXLAN. We also built the Microsoft Network Controller as a programmable surface to create and manage these virtual networks and apply fine-grained policies for security, load balancing, and Quality of Service (QoS). A distinct management-plane – either PowerShell scripts, System Center Virtual Machine Manager (SCVMM), or the Microsoft Azure Stack (MAS) components – programs network policy through the RESTful API exposed by the Microsoft Network Controller. The Network Controller then distributes this policy to each of the Hyper-V hosts using the OVSDB Protocol and a set of schemas to represent virtual networks, ACLs, user-defined routing, and other policy.
Like VLANs, both NVGRE and VXLAN provide isolation by including an identifier (e.g. VXLAN Network Identifier – VNI) to identify the logical network segment (virtual subnet). In WS2016, multiple VNIs (or virtual subnets) can be combined within a Routing Domain so that isolation between tenant virtual networks is maintained thereby allowing for overlapping IP address spaces. A network compartment is created on the Hyper-V host for each routing domain with a Distributed Router used to route traffic between virtual subnets for a given tenant. Admins can also create User-Defined Routes to chain virtual appliances into the traffic path for increased security and functionality. Unlike physical networks with VLANs where policy is closely tied to location and the physical port to which a server (hosting a VM) is attached, a network endpoint (VM) is free to move across the datacenter while ensuring that all policy moves along with it.
As network equipment manufacturers build support for VXLAN into their NIC cards, switches, and routers, they can support:
- Encapsulation Task Offloads to offload operations from the OS and Host CPU onto the NIC Card
- ECMP Spreading using the UDP source port as a hash to distribute connections
Microsoft has worked with the major NIC vendors to ensure support exists for both NVGRE and VXLAN Task Offloads in Windows Server 2016 NIC drivers. These offloads take the processing burden off the Host CPU and instead perform functions such as LSO and Inner Checksums on the physical NIC card itself. Moreover, Microsoft conforms to the standard VXLAN UDP source port hash over the inner packet to ensure ECMP spreading for different connections will just work from ECMP-enabled routers.
VXLAN Implementation in Windows Server 2016
TCP/IP stacks rely on the Address Resolution Protocol (ARP) and port learning performed by traditional layer-2 switches to determine the MAC address of the remote hosts and the ports on a switch to which they are connected. Overlay Virtual Networking encapsulates the VM traffic’s packet headers and data (inner packet) inside of a Layer-3 IP (outer) packet transmitted on the underlay (physical) network. Therefore, before a VM attached to a virtual network can send a unicast packet destined for another VM in the same virtual subnet, it must first learn or be told the remote VM MAC address as well as the VTEP IP address of the host on which the VM is running. This allows the VM to place the correct Destination MAC address in the inner packet’s Ethernet header and for the host to build the encapsulated packet with the correct Destination IP address to deliver the packet to the remote host.
The VXLAN RFC talks about different approaches for distributing the VTEP IP to VM MAC mapping information:
- Learning-based control plane
- Central Authority-/directory-based lookup
- Distribution of this mapping information to the VTEPs by the central authority
In a learning-based control-plane, encapsulated packets having an unknown destination (VTEP IP) are sent out via broadcast or to an IP multicast group. This requires a mapping between a multicast group and each virtual subnet (identified by a VXLAN Network Identifier (VNI)) such that any VTEPs which host VMs attached to this VNI register for the group through IGMP. The VM MAC addresses and remote host’s IP address (VTEP IP) are then discovered via source-address learning. A clear disadvantage of this approach is that it places a lot of unnecessary traffic on the wire which most network administrators try to avoid.
Based on our learnings in Azure, Microsoft chose the distribution by a central authority (i.e. Microsoft Network Controller) approach to send out the VM MAC : VTEP IP mapping information to avoid the unnecessary broadcast/multicast network traffic. The Microsoft Network Controller (OVSDB Client) communicates with the Hyper-V Hosts (VTEPs) using the OVSDB protocol with policy represented in schemas persisted to a Host Agent’s database (OVSDB Server). A local ARP responder on the host is then able to catch and respond to all ARP requests from the VMs to provide the destination MAC address of the remote VM. The Host Agent database also contains the VTEP IP address of all hosts attached to the virtual subnet. The Host Agent programs mapping rules into the VFP extension of the Hyper-V Virtual Switch to correctly encapsulate and send the VM packet based on the destination VM.
Figure 2 – Network Controller – Host Agent Communication
At present, Microsoft’s implementation of VXLAN does not support interoperability with third party hardware VTEPs due to a difference in OVSDDB schemas. We created a custom OVSDB schema to convey additional policy information such as ACLs and service chaining which was not available in the initial hardware_VTEP schema. However, support for the core protocols (VXLAN, OVSDB) is in place in the platform for us to bring in support for hardware VTEPs in the future. Our current thinking on implementation is that we will support the hardware_VTEP schema in our Network Controller and distribute mapping information to the hardware VTEPs. We do not think that a learning-based control plane is the right solution due to the increased amount of multicast/broadcast required on the network – network admins are already trying to limit this L2 traffic which by some accounts consumes 50% of network capacity.
If this is something of interest, please do reply in the comments field below and let us know. We’d love to speak with you.
SDN Network Virtualization Key Features
- Create thousands of logical network segments
- User-Defined Routing (UDR) for Virtual Appliances
- Multi-tenancy support through individual routing domains
- VXLAN and NVGRE encapsulation
- Distributed Router on each Hyper-V host
- Integration with Software Load Balancer (SLB), Gateways, QoS, and Distributed Firewall
- Network virtualization policy programmed through the Microsoft Network Controller using OVSDB