Live Migration, Cluster Shared Volumes & Networks

The recommendation for people setting up live migration clusters is to isolate different kinds of traffic on their own networks:

  1. Public network to access the cluster and the virtual machines running on it
  2. “Private” cluster heartbeat network
  3. “live migration” network
  4. iSCSI network, if required to access shared storage

How do we determine what traffic goes where?

For public and private, the failover cluster manager tool is self-explanatory:

image

We select the appropriate cluster network properties. If we want to limit such network to private traffic, we do not allow clients to connect through it.

If we don’t want the cluster to use such network at all, e.g. because it is dedicated to iSCSI, we select the “Do not allow…” button.

How about the live migration traffic, though? It can be quite heavy, as we are copying memory pages from one host to another. We can select in which order to use cluster networks for such traffic through the failover cluster manager

image

The property requires some digging: expand “services and applications”, select the virtual machine in question, then in the main panel right-click on “virtual machine <name>” and you’ll see tab called “network for live migration”. You can then select and sort in order of priority the networks that you want to use. By default, live migration will select a network that is NOT used for CSV traffic. Note that you may have networks in this panel that were not selected for cluster use before. If you use iSCSI, de-select the relevant entry to make sure that the live migration traffic does not go through that network.

This brings me to cluster shared volumes. One of the great features of CSVs is that if the storage link (iSCSI, fibre) becomes unavailable for any reason on a node, storage traffic can be redirected over the cluster network to another node and hence to the storage device. But which cluster network?

Inter-node communications and CSV traffic will use the available network authorized for cluster use that has the lowest metric value. We can see the metrics with old cluster.exe

C:\Windows\system32>cluster net /prop
Listing properties for all networks:

T Network Name Value
-- -------------------- ------------------------ -----------------------
SR Cluster Network 1 Name Cluster Network 1
MR Cluster Network 1 IPv6Addresses <cut on purpose>
MR Cluster Network 1 IPv6PrefixLengths <..>
MR Cluster Network 1 IPv4Addresses <cut on purpose>
MR Cluster Network 1 IPv4PrefixLengths <..>
SR Cluster Network 1 Address <..>
SR Cluster Network 1 AddressMask <..>
S Cluster Network 1 Description
D Cluster Network 1 Role 3 (0x3)
D Cluster Network 1 Metric 10001 (0x2711)
D Cluster Network 1 AutoMetric 0 (0x0)
SR Cluster Network 2 Name Cluster Network 2
MR Cluster Network 2 IPv6Addresses
MR Cluster Network 2 IPv6PrefixLengths
MR Cluster Network 2 IPv4Addresses <..>
MR Cluster Network 2 IPv4PrefixLengths <..>
SR Cluster Network 2 Address <..>
SR Cluster Network 2 AddressMask <..>
S Cluster Network 2 Description
D Cluster Network 2 Role 1 (0x1)
D Cluster Network 2 Metric 1000 (0x3e8)
D Cluster Network 2 AutoMetric 1 (0x1)

Note the 3 values:

  • Role: 1 for a private network, 0 for ignored by cluster, 3 for mixed traffic
  • Metric: the “weight” of the connection, generally in the 10,000 range for public networks, 1,000 for private ones. If a network has a default gateway, it is considered public; if not, private. Should there be more than one private or public network, the metric is incremented by 100 in order of enumeration (e.g. private network 2 will have a default metric of 1,100)
  • Autometric: 1 if the metric is set automatically by the cluster, 0 if you have set it manually.

So in my simple case the heartbeat network will also be used for CSV traffic. If you have more than 1 private network and you want to prioritize them, you can set the metric with cluster.exe, e.g.

C:\Windows\system32>cluster net "Cluster Network 2" /prop metric=1001

C:\Windows\system32>cluster net "Cluster Network 2" /prop

Listing properties for 'Cluster Network 2':

T Network Name Value
-- -------------------- ------------------------ -----------------
SR Cluster Network 2 Name Cluster Network 2
MR Cluster Network 2 IPv6Addresses
MR Cluster Network 2 IPv6PrefixLengths
MR Cluster Network 2 IPv4Addresses <..>
MR Cluster Network 2 IPv4PrefixLengths <..>
SR Cluster Network 2 Address <..>
SR Cluster Network 2 AddressMask <..>
S Cluster Network 2 Description
D Cluster Network 2 Role 1 (0x1)
D Cluster Network 2 Metric 1001 (0x3e9)
D Cluster Network 2 AutoMetric 0 (0x0)

Redirection of the traffic is automatic: if a network becomes unavailable, the next-lowest-metric one will be used. If another network with a lower metric becomes available, it will be used from that point onwards.

In Summary

By default, live migration traffic will be put on the network with the second-lowest metric. CSV traffic will be put on the the network with the lowest metric. In this simple example, I just have a public and private network, so the public one is used for live migration and the private one for csv and cluster traffic.