Deploying Windows Server 2012 with SMB Direct (SMB over RDMA) and the Mellanox ConnectX-3 using 10GbE/40GbE RoCE – Step by Step

1) Introduction

We have covered the basics of SMB Direct and some of the use cases in previous blog posts and TechNet articles. You can find them at https://smb3.info.

However, I get a lot of questions about specifically which cards work with this new feature and how exactly you set those up. This is one in a series of blog posts that cover specific instructions for RDMA NICs. In this specific post, we’ll cover all the details to deploy the Mellanox ConnectX-3 adapters using the RoCE (RDMA over Converged Ethernet) “flavor” of RDMA.

2) Hardware and Software

To implement and test this technology, you will need:

  • Two or more computers running Windows Server 2012 Release Candidate
  • One or more Mellanox ConnectX-3 adapter for each server
  • One or more 10GbE or 40GbE Ethernet switches with the Priority Flow Control (PFC) capability
  • Two or more cables required for the ConnectX-3 card (typically using SFP+ connectors for 10GbE or QSFP connectors for 40GbE)

Note 1: The older Mellanox InfiniBand adapters (including the first generation of ConnectX adapters and the InfiniHost III adapters), won't work with SMB Direct in Windows Server 2012.

Note 2: Although the Mellanox ConnectX-2 adapters are supported for InfiniBand, they are not recommended with RoCE because they won’t fully support Priority Flow Control (PFC).

There are many options in terms of adapters, cables and switches.  At the Mellanox web site you can find more information about these RoCE adapters (https://www.mellanox.com/content/pages.php?pg=ethernet_cards_overview&menu_section=28) and Ethernet switches (https://www.mellanox.com/content/pages.php?pg=ethernet_switch_overview&menu_section=71). Here are some examples of configurations you can use to try the Windows Server 2012:

2.1) Two computers using 10GbE RoCE

If you want to setup a simple pair of computers to test SMB Direct, you simply need two adapters and a back-to-back cable. This could be used for simple testing like one file server and one Hyper-V server.
For 10Gbit Ethernet, you can use adapters with SFP+ connectors. Here are the parts you will need:

  • 2 x ConnectX-3 adapter, dual port, 10GbE, SFP+ connector (part # MCX312A-XCBT)
  • 1 x SFP+ to SFP+ cable, 10GbE, 1m (part # MC3309130-001)

2.2) Eight computers using dual 10GbE RoCE

If you want to try a more realistic configuration with RoCE, you could setup a two-node file server cluster connected to a six-node Hyper-V cluster. In this setup, you will need 8 computers, each with a dual port RoCE adapter. You will also need a 10GbE switch with at least 16 ports. Using 10GbE and SFP+ connectors, you’ll need the following parts:

  • 8 x ConnectX-3 adapter, dual port, 10GbE, SFP+ connector (part # MCX312A-XCBT)
  • 16 x SFP+ to SFP+ cable, 10GbE, 1m (part # MC3309130-001)
  • 1 10GbE Switch, 64 ports, SFP+ connectors, PFC capable (part # MSX1016X)

Note: You can also use a 10GbE switch from another vendor, as long as it provides support for Priority Flow Control (PFC). A common example is the Cisco Nexus 5000 series of switches.

2.3) Two computers using 40GbE RoCE

You may also try the faster FDR speeds (54Gbps). The minimum setup in this case would again be two cards and a cable. Please note that you need a cable with a specific type of QSFP connector for 40GbE . Here’s what you will need:

  • 2 x ConnectX-3 adapter, dual port, 40GbE, QSFP connector  (part # MCX314A-BCBT)
  • 1 x QSFP to QSFP cable, 40GbE, 1m  (part # MC2206130-001)

Note: You will need a system with PCIe Gen3 slots to achieve the rated speed in this card. These slots are available on newer system like the ones equipped with an Intel Romley motherboard. If you use an older system, the card will be limited by the speed of the older PCIe Gen2 bus.

2.4) Ten computers using dual 40GbE RoCE

If you’re interested in experience great throughput in a private cloud setup, you could configure a two-node file server cluster plus an eight-node Hyper-V cluster. You could also use two 40GbE RoCE adapters for each system, for added performance and fault tolerance. In this setup, you would need 20 adapters and a 20-port switch. Here are the parts required:

  • 20 x ConnectX-3 adapter, dual port, 40GbE, QSFP connector (part # MCX314A-BCBT)
  • 20 x QSFP to QSFP cable, 40GbE, 1m (part # MC2206130-001)
  • 1 40GbE Switch, 36 ports, SFP+ connectors, PFC capable (part # MSX1036B)

Note: You will need a system with PCIe Gen3 slots to achieve the rated speed in this card. These slots are available on newer system like the ones equipped with an Intel Romley motherboard. If you use an older system, the card will be limited by the speed of the older PCIe Gen2 bus.

3) Download and update the drivers

Windows Server 2012 RC includes an inbox driver for the Mellanox ConnectX-3 cards. However, Mellanox provides updated firmware and drivers for download. You should be able to use the inbox driver to access the Internet to download the updated driver.

The latest Mellanox drivers for Windows Server 2012 RC can be downloaded from the Windows Server 2012 tab on this page on the Mellanox web site: https://www.mellanox.com/content/pages.php?pg=products_dyn&product_family=32&menu_section=34.

The package is provided to you as a single executable file. Simply run the EXE file to update the firmware and driver. This package will also install Mellanox tools on the server. Please note that this package is different from the Windows Server 2012 Beta package. Make sure you grab the latest version.

After the download, simply run the executable file and choose one of the installation options (complete or custom). The installer will automatically detect if you have at least one card with an old firmware, offering to update it. You should always update to the latest firmware provided.

 

clip_image002
 
 

Note 1: This package does not update firmware for OEM cards. If you using this type of card, contact your OEM for an update.

Note 2: Certain Intel Romley systems won't boot Windows Server 2012 when an old Mellanox firmware is present. It might be required for you to update the firmware of the Mellanox card using another system before you can use that Mellanox card on the Intel Romley system. That issue might also be addressed in certain cases by updating the firmware/BIOS of the Intel Romley system.

4) Configure the cards for RoCE

The ConnectX-3 cards can be used for both InfiniBand and Ethernet, so you need to make sure they are in the right protocol.

To do that using a GUI, follow the steps below:

  • Open the Device Manager
  • Right click on the "Mellanox ConnectX VPI" device under System Devices and click on Properties, then click on Port Protocol
  • Change the port types to be "ETH" instead of "Auto" or "IB" 

 

clip_image002

 

Using PowerShell, you can achieve the same results by running the following cmdlets:

Dir HKLM:'SYSTEM\CurrentControlSet\Control\Class\' -ErrorAction SilentlyContinue -Recurse | ? {
(Get-ItemProperty $_.pspath -Name 'DriverDesc' -ErrorAction SilentlyContinue ) -match 'Mellanox ConnectX VPI' } | % {          
Set-ItemProperty ($_.pspath+”\Parameters”) -Name PortType –Value “eth,eth”

Note: If the card you have supports only RoCE (this is true for specific cards with SFP+ connectors), Ethernet will be the only choice and the IB option will be greyed out.

5) Configuring Priority Flow Control (PFC)

In order to function reliably, RoCE needs the configuration of PFC (Priority Flow Control) on all nodes and also all switches in the flow path.

5.1) Configuring PFC on Windows

To configure PFC on the Windows Servers, you need to perform the following steps:

  • Clear previous configurations, if applicable
  • Enable the Data Center Bridging (DCB) feature on both client and server
  • Create a Quality of Service (QoS) policy to tag RoCE traffic on both client and server
  • Enable Priority Flow Control (PFC) on a specific priority (the example below use priority 3)
  • Plumb down the DCB settings to the NICs (the example below assumes the NIC is called "Ethernet 4")
  • Optionally, you can limit the bandwidth used by the SMB traffic (the example below limits that to 60%)

Here are the cmdlets to perform all the steps above using PowerShell:

# Clear previous configurations
Remove-NetQosTrafficClass
Remove-NetQosPolicy -Confirm:$False

# Enable DCB
Install-WindowsFeature Data-Center-Bridging

# Disable the DCBx setting:
Set-NetQosDcbxSetting -Willing 0

# Create QoS policies and tag each type of traffic with the relevant priority
New-NetQosPolicy "SMB" -NetDirectPortMatchCondition 445 -PriorityValue8021Action 3
New-NetQosPolicy "DEFAULT" -Default -PriorityValue8021Action 3
New-NetQosPolicy "TCP" -IPProtocolMatchCondition TCP -PriorityValue8021Action 1
New-NetQosPolicy "UDP" -IPProtocolMatchCondition UDP -PriorityValue8021Action 1

# If VLANs are used, mark the egress traffic with the relevant VlanID:
Set-NetAdapterAdvancedProperty -Name <Network Adapter Name> -RegistryKeyword "VlanID" -RegistryValue <ID>

# Enable Priority Flow Control (PFC) on a specific priority. Disable for others
Enable-NetQosFlowControl -Priority 3
Disable-NetQosFlowControl 0,1,2,4,5,6,7

# Enable QoS on the relevant interface
Enable-NetAdapterQos -InterfaceAlias "Ethernet 4"

# Optionally, limit the bandwidth used by the SMB traffic to 60%
New-NetQoSTrafficClass "SMB" -Priority 3 -Bandwidth 60 -Algorithm ETS

 

Note: When you have a Kernel Debugger attached to the computer (this is only applicable for developers), flow control is always disabled. In that case, you need to run the following PowerShell cmdlet to disable this behavior:

Set-ItemProperty HKLM:"\SYSTEM\CurrentControlSet\Services\NDIS\Parameters" AllowFlowControlUnderDebugger -Type DWORD -Value 1 –Force

 

5.2) Configuring PFC on the Switch

You need to enable Priority Flow Control on the switch as well. This configuration will vary according to the switch you chose. Refer to your switch documentation for details.

For Mellanox switches, refer to chapter 3.6.3 of the WinOF4.40 User's Manual.

6) Configure IP Addresses

After you have the drivers in place, you should configure the IP address for your NIC. If you’re using DHCP, that should happen automatically, so just skip to the next step.

For those doing manual configuration, assign an IP address to your interface using either the GUI or something similar to the PowerShell below. This assumes that the interface is called RDMA1, that you’re assigning the IP address 192.168.1.10 to the interface and that your DNS server is at 192.168.1.2.

Set-NetIPInterface -InterfaceAlias RDMA1 -DHCP Disabled
Remove-NetIPAddress -InterfaceAlias RDMA1 -AddressFamily IPv4 -Confirm:$false
New-NetIPAddress -InterfaceAlias RDMA1 -IPAddress 192.168.1.10 -PrefixLength 24 -Type Unicast
Set-DnsClientServerAddress -InterfaceAlias RDMA1 -ServerAddresses 192.168.1.2

7) Verify everything is working

Follow the steps below to confirm everything is working as expected:

7.1) Verify network adapter configuration

Use the following PowerShell cmdlets to verify Network Direct is globally enabled and that you have NICs with the RDMA capability. Run on both the SMB server and the SMB client.

Get-NetOffloadGlobalSetting | Select NetworkDirect
Get-NetAdapterRDMA
Get-NetAdapterHardwareInfo

7.2) Verify SMB configuration

Use the following PowerShell cmdlets to make sure SMB Multichannel is enabled, confirm the NICs are being properly recognized by SMB and that their RDMA capability is being properly identified.

On the SMB client, run the following PowerShell cmdlets:

Get-SmbClientConfiguration | Select EnableMultichannel
Get-SmbClientNetworkInterface

On the SMB server, run the following PowerShell cmdlets:

Get-SmbServerConfiguration | Select EnableMultichannel
Get-SmbServerNetworkInterface
netstat.exe -xan | ? {$_ -match "445"}

Note: The NETSTAT command confirms if the File Server is listening on the RDMA interfaces.

7.3) Verify the SMB connection

On the SMB client, start a long-running file copy to create a lasting session with the SMB Server. While the copy is ongoing, open a PowerShell window and run the following cmdlets to verify the connection is using the right SMB dialect and that SMB Direct is working:

Get-SmbConnection
Get-SmbMultichannelConnection
netstat.exe -xan | ? {$_ -match "445"}

Note: If you have no activity while you run the commands above, it’s possible you get an empty list. This is likely because your session has expired and there are no current connections.

8) Review Performance Counters

There are several performance counters that you can use to verify that the RDMA interfaces are being used and that the SMB Direct connections are being established. You can also use the regular SMB Server and and SMB Client performance counters to verify the performance of SMB, including IOPs (data requests per second), Latency (average seconds per request) and Throughput (data bytes per second). Here's a short list of the relevant performance counters.

On the SMB Client, watch for the following performance counters:

  • RDMA Activity - One instance per RDMA interface
  • SMB Direct Connection - One instance per SMB Direct connection
  • SMB Client Shares - One instance per SMB share the client is currently using

On the SMB Server, watch for the following performance counters:

  • RDMA Activity - One instance per RDMA interface
  • SMB Direct Connection - One instance per SMB Direct connection
  • SMB Server Shares - One instance per SMB share the server is currently sharing
  • SMB Server Session - One instance per client SMB session established with the server

For both client and server, watch for the following Mellanox performance counters:

  • Mellanox Adapter Diagnostic
  • Mellanox Adpter QoS
  • Mellanox Adapter Traffic 

 

 

9) Review the connection log details (optional)

SMB 3.0 now offers a “Object State Diagnostic” event log that can be used to troubleshoot Multichannel (and therefore RDMA) connections. Keep  in mind that this is a debug log, so it’s very verbose and requires a special procedure for gathering the events. You can follow the steps below:

First, enable the log in Event Viewer:

  • Open Event Viewer
  • On the menu, select “View” then “Show Analytic and Debug Logs”
  • Expand the tree on the left: Applications and Services Log, Microsoft, Windows, SMB Client, ObjectStateDiagnostic
  • On the “Actions” pane on the right, select “Enable Log”
  • Click OK to confirm the action.

After the log is enabled, perform the operation that requires an RDMA connection. For instance, copy a file or run a specific operation.
If  you’re using mapped drives, be sure to map them after you enable the log, or else the connection events won’t be properly captured.

Next, disable the log in Event Viewer:

  • In Event Viewer, make sure you select Applications and Services Log, Microsoft, Windows, SMB Client, ObjectStateDiagnostic
  • On the “Actions” page on the right, “Disable Log”

Finally, review the events on the log in Event Viewer. You can filter the log to include only the SMB events that confirm that you have an SMB Direct connection or only error events.

The “Smb_MultiChannel” keyword will filter for connection, disconnection and error events related to SMB. You can also filter by event numbers 30700 to 30706.

  • Click on the “ObjectStateDiagnostic” item on the tree on the left.
  • On the “Actions” pane on the right, select “Filter Current Log…”
  • Select the appropriate filters

You can also use a PowerShell window and run the following cmdlets to view the events. If there are any RDMA-related connection errors, you can use the following:

Get-WinEvent -LogName Microsoft-Windows-SMBClient/ObjectStateDiagnostic -Oldest |? Message -match "RDMA"

10) Conclusion

I hope this helps you with your testing of the Mellanox RoCE adapters. I wanted to covered all different angles to make sure you don’t miss any relevant steps. I also wanted to have enough troubleshooting guidance here to get you covered for any known issues. Let us know how was your experience by posting a comment.