Building Windows Server Failover Cluster on Azure IAAS VM – Part 2 (Network and Creation)

Hello, cluster fans. In my previous blog, Part 1, I talked about how to work around the storage blocker in order to implement Windows Server Failover Cluster on Azure IAAS VM. Now let’s discuss another important part – Networking in Cluster on Azure.

Before that, you should know some basic concepts of Azure networking. Here are a few Azure terms we need use to setup the Cluster.

VIP (Virtual IP address): A public IP address belongs to the cloud service. It also serves as an Azure Load Balancer which tells how network traffic should be directed before being routed to the VM.

DIP (Dynamic IP address): An internal IP assigned by Microsoft Azure DHCP to the VM.

Internal Load Balancer: It is configured to port-forward or load-balance traffic inside a VNET or cloud service to different VMs.

Endpoint: It associates a VIP/DIP + port combination on a VM with a port on either the Azure Load Balancer for public-facing traffic or the Internal Load Balancer for traffic inside a VNET (or cloud service).

You can refer to this blog for more details about those terms for Azure network:

VIPs, DIPs and PIPs in Microsoft Azure
http://blogs.msdn.com/b/cloud_solution_architect/archive/2014/11/08/vips-dips-and-pips-in-microsoft-azure.aspx

OK, enough reading, Storage is ready and we know the basics of Azure network, can we start to building the Cluster? Yes!

Instead of using Failover Cluster Manager, the preferred method is to use the New-Cluster PowerShell cmdlet and specify a static IP during Cluster creation. When doing it this way, you can add all the nodes and use the proper IP Address from the get go and not have to use the extra steps through Failover Cluster Manager.

Take the above environment as example:

New-Cluster -Name DEMOCLUSTER -Node node1,node2 -StaticAddress 10.0.0.7

Note:The Static IP Address that you appoint to the CNO is not for network communication. The only purpose is to bring the CNO online due to the dependency request. Therefore, you cannot ping that IP, cannot resolve DNS name, and cannot use the CNO for management since its IP is an unusable IP.

If for some reason you do not want to use PowerShell or you used Failover Cluster Manager instead, there are additional steps that you must take.  The difference with FCM versus PowerShell is that you need create the Cluster with one node and add the other nodes as the next step. This is because the Cluster Name Object (CNO) cannot be online since it cannot acquire a unique IP Address from the Azure DHCP service. Instead, the IP Address assigned to the CNO is a duplicate address of node who owns CNO. That IP fails as a duplicate and can never be brought online. This eventually causes the Cluster to lose quorum because the nodes cannot properly connect to each other. To prevent the Cluster from losing quorum, you start with a one node Cluster. Let the CNO’s IP Address fail and then manually set up the IP address.

Example:

The CNO DEMOCLUSTER is offline because the IP Address it is dependent on is failed. 10.0.0.4 is the VM’s DIP, which is where the CNO’s IP duplicates from.

image

In order to fix this, we will need go into the properties of the IP Address resource and change the address to another address in the same subnet that is not currently in use, for example, 10.0.0.7.

To change the IP address, right mouse click on the resource, choose the Properties of the IP Address, and specify the new 10.0.0.7 address.

image

Once the address is changed, right mouse click on the Cluster Name resource and tell it to come online.

image

Now that these two resources are online, you can add more nodes to the Cluster.

Now you’ve successfully created a Cluster. Let’s add a highly available role inside it. For the demo purpose, I’ll use the File Server role as an example since this is the most common role that lot of us can understand.

Note:In a production environment, we do not recommend File Server Cluster in Azure because of cost and performance. Take this example as a proof of concept.

Different than Cluster on-premises, I recommend you to pause all other nodes and keep only one node up. This is to prevent the new File Server role from moving among the nodes since the file server’s VCO (Virtual Computer Object) will have a duplicated IP Address automatically assigned as the IP on the node who owns this VCO. This IP Address fails and causes the VCO not to come online on any node. This is a similar scenario as for CNO we just talked about previously.

Screenshots are more intuitive.

The VCO DEMOFS won’t come online because of the failed status of IP Address. This is expected because the dynamic IP address duplicates the IP of owner node.

image

Manually editing the IP to a static unused 10.0.0.8, in this example, now the whole resource group is online.

image

But remember, that IP Address is the same unusable IP address as the CNO’s IP. You can use it to bring the resource online but that is not a real IP for network communication. If this is a File Server, none of the VMs except the owner node of this VCO can access the File Share.  The way Azure networking works is that it will loop the traffic back to the node it was originated from.

Show time starts. We need to utilize the Load Balancer in Azure so this IP Address is able to communicate with other machines in order to achieving the client-server traffic.

Load Balancer is an Azure IP resource that can route network traffic to different Azure VMs. The IP can be a public facing VIP, or internal only, like a DIP. Each VM needs have the endpoint(s) so the Load Balancer knows where the traffic should go. In the endpoint, there are two kinds of ports. The first is a Regular port and is used for normal client-server communications. For example, port 445 is for SMB file sharing, port 80 is HTTP, port 1433 is for MSSQL, etc. Another kind of port is a Probe port. The default port number for this is 59999. Probe port’s job is to find out which is the active node that hosts the VCO in the Cluster. Load Balancer sends the probe pings over TCP port 59999 to every node in the cluster, by default, every 10 seconds. When you configure a role in Cluster on an Azure VM, you need to know out what port(s) the application uses because you will need to add the port(s) to the endpoint. Then, you add the probe port to the same endpoint. After that, you need update the parameter of VCO’s IP address to have that probe port. Finally, Load Balancer will do the similar port forward task and route the traffic to the VM who owns the VCO. All the above settings need to be completed using PowerShell as the blog was written.

Note: At the time of this blog (written and posted), Microsoft only supports one resource group in cluster on Azure as an Active/Passive model only. This is because the VCO’s IP can only use the Cloud Service IP address (VIP) or the IP address of the Internal Load Balancer. This limitation is still in effect although Azure now supports the creation of multiple VIP addresses in a given Cloud Service.

Here is the diagram for Internal Load Balancer (ILB) in a Cluster which can explain the above theory better:

image

The application in this Cluster is a File Server. That’s why we have port 445 and the IP for VCO (10.0.0.8) the same as the ILB. There are three steps to configure this:

Step 1: Add the ILB to the Azure cloud service.

Run the following PowerShell commands on your on-premises machine which can manage your Azure subscription.

# Define variables.

$ServiceName = "demovm1-3va468p3" # the name of the cloud service that contains the VM nodes. Your cloud service name is unique. Use Azure portal to find out service name or use get-azurevm.

image

$ILBName = "DEMOILB" # newly chosen name for the new ILB

$SubnetName = "Subnet-1" # subnet name that the VMs use in the VNet

$ILBStaticIP = "10.0.0.8" # static IP address for the ILB in the subnet

# Add Azure ILB using the above variables.

Add-AzureInternalLoadBalancer -InternalLoadBalancerName $ILBName -SubnetName $SubnetName -ServiceName $ServiceName -StaticVNetIPAddress $ILBStaticIP

# Check the settings.

Get-AzureInternalLoadBalancer –servicename $ServiceName

image

Step 2: Configure the load balanced endpoint for each node using ILB.

Run the following PowerShell commands on your on-premises machine which can manage your Azure subscription.

# Define variables.

$VMNodes = "DEMOVM1", “DEMOVM2" # cluster nodes’ names, separated by commas. Your nodes’ names will be different.

$EndpointName = "SMB" # newly chosen name of the endpoint

$EndpointPort = "445" # public port to use for the endpoint for SMB file sharing. If the cluster is used for other purpose, i.e., HTTP, the port number needs change to 80.

# Add endpoint with port 445 and probe port 59999 to each node. It will take a few minutes to complete. Please pay attention to ProbeIntervalInSeconds parameter. This tells how often the probe port detects which node is active.

ForEach ($node in $VMNodes)

{

Get-AzureVM -ServiceName $ServiceName -Name $node | Add-AzureEndpoint -Name $EndpointName -LBSetName "$EndpointName-LB" -Protocol tcp -LocalPort $EndpointPort -PublicPort $EndpointPort -ProbePort 59999 -ProbeProtocol tcp -ProbeIntervalInSeconds 10 -InternalLoadBalancerName $ILBName -DirectServerReturn $true | Update-AzureVM

}

# Check the settings.

ForEach ($node in $VMNodes)

{

Get-AzureVM –ServiceName $ServiceName –Name $node | Get-AzureEndpoint | where-object {$_.name -eq "smb"}

}

Step 3: Update the parameters of VCO’s IP address with Probe Port.

Run the following PowerShell commands inside one of the cluster nodes if you are using Windows Server 2008 R2.

# Define variables

$ClusterNetworkName = "Cluster Network 1" # the cluster network name (Use Get-ClusterNetwork or GUI to find the name)

$IPResourceName = “IP Address 10.0.0.0" # the IP Address resource name (Use get-clusterresource | where-object {$_.resourcetype -eq "IP Address"} or GUI to find the name)

$ILBIP = “10.0.0.8” # the IP Address of the Internal Load Balancer (ILB)

# Update cluster resource parameters of VCO’s IP address to work with ILB.

cluster res $IPResourceName /priv enabledhcp=0 overrideaddressmatch=1 address=$ILBIP probeport=59999  subnetmask=255.255.255.255

Run the following PowerShell commands inside one of the cluster nodes if you are using Windows Server 2012/2012 R2.

# Define variables

$ClusterNetworkName = "Cluster Network 1" # the cluster network name (Use Get-ClusterNetwork or GUI to find the name)

$IPResourceName = “IP Address 10.0.0.0" # the IP Address resource name (Use get-clusterresource | where-object {$_.resourcetype -eq "IP Address"} or GUI to find the name)

$ILBIP = “10.0.0.8” # the IP Address of the Internal Load Balancer (ILB)


$params = @{"Address"="$ILBIP";
          "ProbePort"="59999";
          "SubnetMask"="255.255.255.255";
          "Network"="$ClusterNetworkName";
          "OverrideAddressMatch"=1; 
          "EnableDhcp"=0}

# Update cluster resource parameters of VCO’s IP address to work with ILB

Get-ClusterResource $IPResourceName | Set-ClusterParameter -Multiple $params

You should see this window:

image

Take the IP Address resource offline and bring it online again. Start the clustered role.

Now you have an Internal Load Balancer working with the VCO’s IP. One last task you need do is with the Windows Firewall. You need to at least open port 59999 on all nodes for probe port detection; or turn the firewall off. Then you should be all set. It may take about 10 seconds to establish the connection to the VCO the first time or after you failover the resource group to another node because of the ProbeIntervalInSeconds we set up previously.

In this example, the VCO has an Internal IP of 10.0.0.8. If you want to make your VCO public-facing, you can use the Cloud Service’s IP Address (VIP). The steps are similar and easier because you can skip Step 1 since this VIP is already an Azure Load Balancer. You just need to add the endpoint with a regular port plus the probe port to each VM (Step 2). Then update the VCO’s IP in the Cluster (Step 3). Please be aware, your Clustered resource group will be exposed to the Internet since the VCO has a public IP. You may want to protect it by planning enhanced security methods.

Great! Now you’ve completed all the steps of building a Windows Server Failover Cluster on an Azure IAAS VM. It is a bit longer journey; however, you’ll find it useful and worthwhile. Please leave me comments if you have question.

Happy Clustering!

Mario Liu
Support Escalation Engineer
CSS Americas | WINDOWS | HIGH AVAILABILITY