Hi everyone, this article is aimed at providing a best practice guide for building a Windows Server 2008 R2 Failover Cluster. The article provides details on hardware and software pre-requisites and configuration best practices. This guide also provides best practices for different areas of Failover Clustering configuration i.e. storage and network.
Pre-requisites for Failover Clustering
It is recommended to use servers with the same or similar hardware components. You should only use hardware that is compatible with Windows Server 2008 R2. Depending on the cluster you are building you will need multiple network cards and or HBAs. Be sure to finalise the architecture prior to purchasing hardware to ensure you have everything you need before you start building the cluster.
The hardware component must be compatible with Windows Server 2008 R2. If you use iSCSI the network adapters must be dedicated to either network communications or iSCSI.
Ensure the network infrastructure that connects your cluster to the network has redundancies built in. This could mean multiple network cards (teamed), multiple switches and multiple routers. The purpose of this is to ensure there is no single point on failure in the network infrastructure.
It is recommended to use identical host bus adapters (HBA) to communicate with the storage. The drivers and firmware should also be identical. If you want to use different HBA’s verify this with your hardware vendor. Multi-Path I/O software also needs to be identical on the cluster nodes. Windows Server 2008 R2 only supports SCSI Primary Commands-3 (SPC-3) therefore the backend storage must also be compatible with this. Parallel SCSI is no longer supported therefore check your storage specifications prior to building your cluster.
You must ensure that the HBA or network card used is dedicated to the storage. The network used for iSCSI cannot be used for network communications. You cannot use teamed network adapters because they are not supported with iSCSI.
You must use storage that is compatible with Windows Server 2008 R2. Failover clustering natively supports basic disks. To you use dynamic disks you will need to speak to your hardware vendor and use any software provided by them for dynamic disks.
The partition style must be either master boot record (MBR) or GUID partition table (GPT). The file system is recommended to be NTFS. The witness disk must be formatted with NTFS.
Storage Area Network (SAN) Storage
Ensure the SAN is compatible with Windows Server 2008 R2. Ensure the storage drivers, firmware and software is compatible with Failover Clustering. The storage must support SCSI Primary Commands-3 (SPC-3). In particular, the storage must support Persistent Reservations as specified in the SPC-3 standard. The miniport driver used for the storage must work with the Microsoft Storport storage driver.
Storage should be isolated to a cluster in that a LUN provided to a cluster must not be accessible from another cluster through zoning and masking. If you can have a SAN dedicated for a cluster that would be even better.
Consider using multi path I/O which provides the highest level of redundancy and availability.
The servers participating In Failover Clustering must be running the same version of Windows Server 2008 R2 that supports the feature. Service packs, patches and hot fixes must be at the same level on all nodes. Mixing of full server installations and Core server installations is not supported.
Network adapters should ideally be identical for each network and it is recommended the configuration of the adapters be the same i.e. speed & duplex settings. This will ensure the network does not behave differently on different cluster nodes. Each private network not routed to the main network infrastructure should be assigned a unique subnet.
DNS must be running on the network and the cluster nodes must be using DNS for name resolution. When
objects are created in Active directory and DHCP is in use on the cluster nodes the relevant A records will also be created in DNS.
Each cluster node should be in the same Active Directory domain and they should all be running using the same role. It is recommended than each node in the clusters is a member server of the same domain.
The account used to build and create the cluster must have full administrative access on the cluster nodes. A domain user account can be used but it must have Create Computer Objects and Read All Properties rights on the domain.
The account used to build the cluster will also be used to create the Computer Name Object and therefore needs the permissions mentioned above.
Server Setup & Configuration
The following sections talks about the general server configuration in terms of hardware and software. It is recommended to understand the pre-requisites before reading the following sections.
All nodes participating in a cluster should be identical. That is they should have the same configuration at the software level and at the hardware level. Although having identical hardware is not a pre-requisite it is recommended as it makes life easier for the server administrator when updating drivers as the same one can be used on all the nodes. It also provides consistency to the cluster.
Changes made to a node must be replicated to all other nodes of the same cluster. Any changes made should be tested in a test environment first before changes are made to production nodes.
Specific hardware is no longer required however the cluster must pass validation i.e. no errors detected to be officially supported. In a geographically dispersed cluster as there is no shared storage you will see errors in the validation report but this is expected and the cluster will still be supported.
It is highly recommended to remove all single points of failure in terms or hardware and infrastructure.
It is recommended that all cluster nodes have 2 or more power supplies to provide redundancy.
If a node has multiple power supplies each power supply should be connected to a different power source to help prevent any power issues from taking out the node. Each node in the cluster should use a different power source to ensure that a single power strip does not take out the whole cluster.
Data centre infrastructure should also allow for total power loss and should have measures in place to provide power from UPS and/or on site generators.
Having two switches for a network allows the cluster nodes to connect to the LAN and provides redundancy from a network point of view.
For each network available (especially the public – client facing network) two network switches should be used for the cluster nodes to ensure a switch failure does not disrupt network communications.
If you are using iSCSI for storage the cluster should be connected to this network using multiple redundant switches to ensure the backed storage is always accessible.
Host Bus Adapters (HBAs)
It is recommended to use multiple single HBAs to connect to the backend storage. Dual HBAs can be a single point of failure as although they provide a multi-path to the storage a failure of the HBA can disconnect the node from the storage. Each HBA should be using a separate network infrastructure to connect to the SAN(s).
Boot from SAN
Operating System Configuration
Setting up the operating system correctly can increase performance and reliability. Additionally in the event of an operating system error the system can record activities and performance information to help troubleshoot and aid in root cause analysis. In order to capture this information performance monitoring and kernel debugging must be set accordingly.
Page File Configuration
It is good practice to place the page file on another partition other than the boot partition; however, in previous versions of Windows this configuration would not allow you to capture a memory dump. In Windows Server 2008 R2 the page file can be located on another partition which is not the same as the boot partition and will still capture memory dumps.
The Min and Max sizes of the page file should be set to the same value in order to improve performance. Having the Min and Max sizes different can result in slow system performance if the page file has to grow. The page file can become fragmented due to not having contiguous disk space to use and would have to place the data elsewhere on the disk.
The page file can be located on another LUN in cases where the boot partition is small or where the node is booting from SAN.
Kernel Memory Dumps
It is recommended to configure each cluster node to capture a kernel memory dump. A kernel memory dump captures the memory being used by the kernel at the time when the system stops unexpectedly. This information can be useful for troubleshooting and determining the root cause.
By default memory dumps are overwritten i.e. if a stop error occurs again the previous dump file will no longer exist. It is advisable to either copy the dump file to another machine for analysis or configure the server to not overwrite existing dump files.
Service Packs, Hotfixes & Critical Updates
Service packs need to be installed on each node of the cluster so that each node is on the same service pack level. This ensures device drivers, services and executable files are the same version. Hotfixes & Critical updates should also be installed on each cluster node to ensure any device drivers, services and executable files are the same version. Ensure any update installed is tested in a test environment first before deploying it to production.
There are many device drivers on a system and it is highly recommended to ensure that these driver versions are consistent across all nodes of a cluster. Differences in driver versions can cause instability on a cluster and can impact the availability.
Software installed on a cluster node may also install a driver and this also needs to be mirrored across all nodes. If tools are used for troubleshooting which install drivers these should be removed once the troubleshooting has completed to ensure consistency between the cluster nodes.
Roles, Role Services & Features
Each role, role service and feature required in a cluster should be installed on each node in the cluster. If a role, role service or feature is removed from a cluster node it should also be removed from all other cluster nodes.
All services should be identical on each node of the cluster. Any services required for the cluster must be installed on all nodes of the cluster.
Once the cluster is in production it is good to run performance monitoring for a period of time to understand the server load. This can be used as a base line and you can compare future performance data to this base line and see where the performance has changed.
Black Box Monitoring
It is recommended to have monitoring running on all your cluster nodes. This monitoring should be setup so all the relevant counters are logging information to a binary file. Circular logging should also be used to overwrite previous data. This also ensures the captured data file does not grow too large and therefore does not eat up disk space. This black box monitoring is useful when a performance issue occurs as rather than wait for performance data to be gathered once an issue has been raised the performance data is already to hand.
The captured data can also be compared to a server baseline to see what the impact on the server is over time i.e. adding more highly available services or adding more users which consume the services.
Anti-virus products can interfere with the performance of a server and can cause issues with the cluster. The anti-virus should be configured to exclude the cluster specific resources on each cluster node. The following should be excluded on each cluster node;
(1) A witness disk if used
(2) A file share witness if used
(3) The cluster folder %systemroot%\cluster on all cluster nodes
Network Setup & Configuration
There should be adequate networks configured for the cluster to ensure there is no bottle neck for users,
cluster nodes and storage. Depending on the purpose of the cluster you may need an additional network for Hyper-V Live Migration.
Follow the pre-requisites to ensure you are following the best practices for network configuration. The binding orders of the network should be set accordingly. Typically you have the bindings in the following order Public, Private, Storage and Live Migration.
The public network is the network which clients will connect to gain access to the highly available resources on the cluster. This network will be connected to the main network infrastructure in your organization. This is automatically selected when the cluster is built by determining which network has a default gateway.
All networks in a cluster are used for intra cluster communication. You can fine tune this to ensure intra cluster communication uses a specific network and can fall back to another network if required. The private network should also be used for management tasks ensuring no additional overhead is being added to the public, storage and Live Migration networks.
If you are using an iSCSI network this network must be dedicated for storage. You should also consider using NIC teaming for additional redundancy.
The heartbeat for the cluster nodes is set by default to send a heartbeat every second. The heartbeat configuration also by default will allow five missed heartbeats before a cluster node is deemed as unavailable. This can be configured to increase the interval and increase/decrease the threshold. This is of particular interest when using a high latency network (WAN) when implementing a geographically dispersed cluster.
When using Hyper-V in a Failover Cluster you may want to use the Live Migration feature. This requires a dedicated 1GB network to copy the system state and memory pages from one cluster node to another when Live Migration is initiated. Live migration is used for planned maintenance and cannot be used for unplanned outages.
The Hyper-V guests themselves should be location on a Cluster Shared Volume to assist with Live Migration and to ensure the VM is not taken offline to move a cluster disk to another node.
Ensure all HBAs on all nodes are at same hardware devices and ensure the firmware/driver for the HBA is the same version. Speak to your storage vendor with regards to drivers versions. Also read any documentation regarding the driver as it will probably highlight which version of the storport.sys driver is required.
Speak to your hardware vendor/engineer to implement LUN masking or Zoning to ensure that LUNS are isolated from each other. Ideally each cluster should be backed up by a dedicated SAN and it is recommended to have multiple independent paths to the storage.
If you are implementing an iSCSI solution for the storage it is recommended to use dedicated network interface cards so the storage traffic is independent of the public facing and intra cluster communication networks.
Cluster Setup & Configuration
Cluster Validation Wizard
The cluster validation wizard should be run before the initial cluster build takes place and it should also be run once you have completed building and configuring your cluster. It is also recommended to run the wizard every time to make a change to the cluster.
The storage tests will fail once the cluster is in production as the tests will not offline any disks that are in use unless explicitly told to do so. It is recommended to have a small LUN present on the cluster which is not is use so the storage tests can take place. This will ensure there are no issues with the storage and arbitration.
The quorum model is automatically selected when you initially build the cluster. You can make changes to the quorum model on the fly without impacting the cluster. Always check the current quorum model so you are aware of what it is and check the chosen model against the recommended model.
Depending on the architecture of the cluster you may want to implement a witness resource in the form of a witness disk or a file share witness. The witness disk should be 512MB in size (can be larger for large and print file clusters).
A file share witness is recommended for geographically dispersed clusters where there is no shared storage. The witness disk will contain a backup of the cluster database and this will be loaded as a 0.cluster hive on the node which currently owns the witness disk.
The file share witness does not keep a copy of the cluster database. Instead it maintains a log of the Paxos tag to provide versioning.
A single file share witness can be used by multiple clusters. It is recommended to make the file share witness highly available in its own cluster to provide the additional vote for multiple clusters. Within the file share witness folder a new folder will be created for each cluster so the data is isolated.
Core Cluster Group
The cluster group should only contain the Network Name, IP and witness disk resources (if required). No other resource should be in this group. If you are using software provided by the storage vendor you may have an additional resource within this group which is acceptable.
The dependencies for these resources should not be altered unless your storage vendor advises you to due to the software used to manage/provide LUNs to the cluster.
Most of the advanced settings should be left at their defaults unless specifically recommended by Microsoft or third party vendors.
Affect the group should only be cleared if the resource is non-critical i.e. secondary network is used for backups and this network allows client access for backup purposes only. As a result of this network failing, if you do not want the whole group to failover uncheck affect the group.
All resources should have all nodes listed in the Possible Owners list. You can set anti affinity rules to ensure groups are kept on separate cluster nodes to ensure the services remain highly available and that the cluster nodes split the load evenly.
Run this resource in a separate Resource Monitor should not be selected unless recommended by Microsoft for troubleshooting or by a 3rd party vendor for a specific reason. Anti-Affinity can be used to ensure the services are dispersed among the cluster nodes.
Chkdsk on Cluster Disks
When cluster disks states are changed to online pending (i.e. when a failover is initiated or when a group is moved) the system will check the dirty bit on the disk and see if it is set. If the dirty bit has been set the disk will remain in an online pending state whilst chkdsk is run on the entire volume.
This can lead to a significant amount of downtime and therefore should be avoided. You should plan on a monthly or quarterly basis to have a maintenance window during which you can manually check the dirty bit ad run chkdsk on the relevant volume(s). Below is a list of the options you can set using PowerShell or Cluster.exe.
|0 (Default)||ShallowCheck open files in root of volume. Check dirty bit|
|1||FullCheck recursively on all files. Check dirty bit|
|2||Run chkdsk every time the volume is mounted|
|3||ShallowCheck. If corrupt run chkdsk. If not corrupt run chkdsk in read-only mode. Online will proceed when chkdsk is running in read-only mode|
|4||Never run chkdsk|
|5||FullCheck. If corrupt do not bring online. User intervention required.|
To ensure groups do not end up on the same node set the AntiAffinityClassName of the group. You can set this to the same string value for all groups you do not want to appear on the same node.
The network priority should be set for each network participating in the cluster. The metric value on the network represents a cost which determines which network to use for what communication.
By default the network that has been assigned a default gateway will have a metric value of 10,000. Other networks without a gateway defined will be assigned a value of between 1000 and 10,000. The first network without a gateway will be given a metric value of 1000 and subsequent networks will be given values in increments of 100 i.e. 1100, 1200, 1300.
Intra cluster communication occurs on networks that have the lower metric value i.e. cost. You need to set the iSCSI and Live Migration networks metric value higher than the private network. If the private network is down the next available network with the lowest will be used.
Set the network priorities according to the network design in your environment.
All nodes must be able to run load that might be on them so if you have printers installed ensure the drivers are on all cluster nodes to ensure the printer can operate after a failover to any node. There should also be enough capacity in terms of resources available on a node to take on additional capacity from other nodes.
Failback causes another outage, however small and therefore it is recommended to set this to occur outside of normal working hours. In a 24x7 operation this may not be acceptable and therefore you should disable failback altogether.
Always test failover and failback prior to production to ensure each node can host the services without issue.
A dedicated account to run the cluster service is no longer required. Instead a cluster name object (CNO) is created in Active Directory when the cluster is built. All further security checks are performed using the context of the CNO.
A domain admin account is required to build the cluster as it is this account that creates the CNO. Once the CNO has been created all further virtual computer objects (VCO) are created using the CNO. As with any Active Directory user there is a limit of 10 imposed on the number objects an account can create. This will need to be changed if you need the CNO to create more than 10 VCO’s.
The CNO object can be pre-staged prior to building the cluster. The pre-staged CNO needs to be set to disabled in order for the newly built cluster to use it.
You can now also set permissions on the cluster to allow users to view the cluster configuration in with read-only privileges.
Installing patches on a cluster can be challenging as there will be a period of time when the cluster nodes are out of sync. It is recommended to install the patches on a test cluster before deploying the updates to the production cluster. This ensures that any adverse effects are seen before impacting production.
Hotfixes are designed to address specific issues and should only be installed if you are facing the symptoms described in the relevant hotfix knowledge base article.
Security Updates should be installed as they are released by Microsoft. Once Microsoft has released the security updates these should be tested in a test environment and deployed to production once deemed ok.
Always ensure you have full backups which have been tested and verified prior to installing updates to a cluster.
Premier Field Engineer - Failover Clustering & Hyper-V