Hyper-V : How many network cards do I need?

[If you are viewing this from the main page the second table appears to be missing text.  Just view the post on its own by clicking the title]

[Edit 19/04/2011 - Include more info on Live Migration NIC]

[Edit 19/06/2011 - More info on NICs for cluster communications]

This is a topic that comes up all of the time.  Especially when dealing with Hyper-V clusters.  The short answer to the question is - it depends.

For this post I will talk about networking requirements for a SAME SITE CLUSTER. Multisite clusters, or stretch clusters, have different networking requirements. I will cover this topic in a different post.

Network requirements for same site Cluster

There are some great TechNet articles and blog posts on this topic.  However, I believe that most of these posts focus on the logical networking and don't take the physical and risk profiles into account.  What do I mean?  I've seen people get really hooked on the "avoid the single point of failure" when designing.  The problem I find is that the design stops at the logical level and doesn't take the physical hardware into account.  Having fault tolerant NIC's for the parent partition isn't much good if both ports are on the same multiport card or backplane.  Sometimes you want the cluster to failover its workload.  Keeping the workload on node at all costs could just be a case of over engineering.  Enterprise Architecture 101 - keep it simple!

First things first, let's just look at the logical minimum requirements for network cards in a Hyper-V cluster. 

Usage

Description

Traffic Requirements

Recommended Connection Type

Parent Partition

Used for the management of the Hyper-V host.  Also used by System Centre Virtual Machine Manager

Typically low bandwidth.  Can increase when deploying VM's from SCVMM.

Public

Storage

iSCSI network connection to SAN

High bandwidth and low latency required

Refer to your storage vendor.  Normally private.

 

VM Network

Used to provide network access for your VM's

Can vary depending on the workload.

Public.

Cluster Heartbeat

Used for cluster communication to determine the status of other cluster nodes

Low bandwidth and low latency required. 

Private.

Cluster Shared Volume (CSV)

Used in scenarios when redirected I/O is required

Idle until redirected I/O kicks in.  In which case High Bandwidth and low latency required,

Private.

Live Migration

Used to transfer the running VM's from one cluster node to another

Idle until Live Migration occurs.  In which case High Bandwidth and low latency required,

Private.

Comments:

  • When I say public for the recommended connetion type I mean that it's ok to have other traffic on the same subnet network.  Or in other words, that network congestion should not impair performance or cause a failover.
  • When I say private for the recommended connection type I mean that ideally you should have a dedicated subet/network for this type of cluster communication.  Congestion could cause performance issues or even trigger failover.
  • The Cluster Heartbeat/Communication can be configured to use any NIC presented to the OS.  So in theory, you dont need to dedicate a NIC to cluster communications anymore.  You could let it use pretty much any interface except the iSCSI.  Old habbits have not died away for me so I prefer to go with a dedicated.  The more NICs you have in the server the better ... its more flexible.

The table above just deals with the logical requirements for a Hyper-V cluster.  I'll deal with single points of failure, combing usage and teaming shortly.  So, based on the above, that's 6 different logical networks.  Your solution may require one or more NICs per logical network.  There are options for combining logical networks as well as options for teaming.  But before we get into that ...  Do you have the network hardware required to warrant NIC teaming or multipath connections?   When you have the answer to this question you can then start to define your networking design in earnest. 

I have a couple of observations/comments I'd like to make:

  • What is the point of having multipath iSCSI connections if both are patched into a single switch or if your SAN only has ports on a single card?
  • What's the point in NIC teaming (fail on fault) if your physical machine has a single point of failure in a multiport network card? The same also holds through for blade enclosures. If there is going to be a hardware failure it's very unlikely to be limited to a single port. The whole card/unit is going to fail.

The whole reason behind clustering your Hyper-v solution is to take into account serious component failure.  With this in mind, don't over engineer things.  Don't get too hooked up on single points of failure in the logical world.  Just make sure you have enough capacity in your cluster for your workloads (the N+1 rule applies).

This is how I would setup network cards in a typical Hyper-V cluster.

Usage

Number of Network Cards

Comments

Parent Partition

1 Network Card

  • Since its low bandwidth, and considering its most likely going to be an on-board NIC port or a port on a multiport card, I wouldn't bother teaming it. If there is a major failure that causes the link to go down, chances are you WILL want the cluster node to failover its workload.
  • Make sure this card is listed first in the Adapter and Bindings connection order.
  • In Failover Cluster Manager make sure that the NIC is configured to allow cluster network communication on this network. This will act as a secondary connection for the Heartbeat.

Storage

2 Network Cards - Not Teamed

  • Follow your SAN hardware vendor's guidelines on this. NIC teaming is supported for iSCSI connections.
  • Disable NetBIOS on these interfaces
  • Do not configure a Gateway
  • Do not configure a DNS server
  • Make sure that each NIC is NOT set to register its connection in DNS
  • Remove File and Printer sharing and Client from Microsoft networks
  • In Failover Cluster Manager select- Do not allow cluster network communication on this network

VM Network

1+ Network cards depending on the workload.  Teaming is optional.  Normally at least 2 cards.

  • The number of cards required depends on the amount of through-put you need as well as the number of different networks required.
  • I would team this card for throughput and not for fault tolerance reasons. If there is a failure chances are you WILL want the cluster node to failover its workload.
  • Disable NetBIOS on these interfaces
  • Do not configure a Gateway
  • Do not configure a DNS server
  • Make sure that each NIC is NOT set to register its connection in DNS
  • Remove File and Printer sharing and Client from Microsoft networks
  • In Failover Cluster Manager select - Do not allow cluster network communication on this network.

Cluster Heartbeat

1 Network Card

  • Disable NetBIOS on this interface
  • Do not configure a Gateway
  • Do not configure a DNS server
  • Make sure that this NIC is NOT set to register its connection in DNS
  • Make sure that Client for Microsoft Networks and File and Printer Sharing for Microsoft Networks are enabled to support Server Message Block (SMB), which is required for CSV.
  • In Failover Cluster Manager make sure that the NIC is configured to allow cluster network communication on this network.
  • In Failover Cluster Manager remove the tick box for Allow Clients Connect through this network. This setting has nothing to do with the host/parent partition. This setting is used to control over what NICs the Cluster Resources can be accessed. This is more relevant for other workloads e.g. File Cluster. It has no impact on the communication with the host partition or for the VM's themselves.

Cluster Shared Volume (CSV)

1 Network Card

  • Disable NetBIOS on this interface
  • Make sure that this NIC is NOT set to register its connection in DNS
  • Make sure that Client for Microsoft Networks and File and Printer Sharing for Microsoft Networks are enabled to support Server Message Block (SMB), which is required for CSV.
  • In Failover Cluster Manager remove the tick box for Allow Clients Connect through this network. This setting has nothing to do with the host/parent partition. This setting is used to control over what NICs the Cluster Resources can be accessed. This is more relevant for other workloads e.g. File Cluster. It has no impact on the communication with the host partition or for the VM's themselves.
  • By default the cluster will automatically choose the NIC to be used for CSV communication. I prefer to manually set the preference. See https://technet.microsoft.com/en-us/library/ff182335(WS.10).aspx for details.
  • This traffic is not routable and has to be on the same subnet for all nodes.

Live Migration

1 Network Card

  • Disable NetBIOS on this interface
  • Make sure that this NIC is NOT set to register its connection in DNS.
  • In Failover Cluster Manager remove the tick box for Allow Clients Connect through this network. This setting has nothing to do with the host/parent partition. This setting is used to control over what NICs the Cluster Resources can be accessed. This is more relevant for other workloads e.g. File Cluster. It has no impact on the communication with the host partition or for the VM's themselves.
  • By default the cluster will automatically choose the NIC to be used for Live-Migration. You can select multiple networks for LM and give them a preference.  Consider the workloads on the NICs before doubling up.  For more into check out https://technet.microsoft.com/en-us/library/ff182335(WS.10).aspx

Notes on the above:

  • The settings Ive outlined for the NICs assigned to VM networks will be overwritten as soon as you select them for use in a Hyper-v network. All protocol settings will be replaced with a Hyper-V virtual network protocol.  Its still best practise to configure the NICs regardless though.
  • Make sure you check DNS to make sure there is only ONE entry for the hostname in both the Forward and Reverse DNS zones once you have the NICs are configured.

So whats the answer:

Based on the above table and assuming 2 network cards for your VM's external network, this means I recommend EIGHT (8) logical network connections/NICs are required at a minimum for a PRODUCTION Hyper-V cluster.  Yes, you could double up on some of the NICs like combining the heartbeat with the CSV NIC but I feel this is the best balance.

I referenced a few TechNet articles when putting this together.  Here they are:

Hyper-V: Live Migration Network Configuration Guide

Requirements for Using Cluster Shared Volumes in a Failover Cluster in Windows Server 2008 R2

 Hyper-V: Using Live Migration with Cluster Shared Volumes in Windows Server 2008 R2

 Designating a Preferred Network for Cluster Shared Volumes Communication