Windows Server 2008 greatly simplifies the installation, configuration and management of Failover Clusters. It’s hard to believe, but the process to create a cluster becomes a simple wizard where you need to enter only the names of the servers (nodes) in the cluster and the cluster name (if you don’t use DHCP, you’ll need to provide an IP address). Truly amazing, if you consider that the Windows Server 2003 procedure required following the instructions on a white paper dozens of pages long. There’s a trick there, though, and it’s called “Validate”.
Clustering is no picnic
Installing a cluster is a complex task. Not only you typically are working with a group of high-end multi-homed servers with a sophisticated storage back-end, but you are also running important workloads like SQL Servers (more than half of clusters run databases), Exchange or vital File Shares. Of course these are very critical resources, or you wouldn’t be investing in making them highly available.
Planning and configuring a cluster is a challenge that is usually left to experienced professionals. In most places, there’s only a few people (if not only one person) that is allowed to even touch the cluster configuration. Even these experts follow that white paper carefully and double check every step.
Validation is the key change
With Windows Server 2008, that should change dramatically and an average Windows Server administrator will finally be able configure a cluster with no external help and without following steps on a white paper. You will still need to plan everything and understand how to install a group of servers, setup the network and manage the storage (or have people in your team that do), but the whole procedure will be a lot more forgiving.
One of the key changes is that you are now required to run a “Validate” process before you run the wizard to create a cluster. This tool was introduced as a download for Windows Server 2003 (look for "ClusPrep" at http://www.microsoft.com/windowsserver2003/enterprise/clustering.mspx) and the version 2.0 of this tool is now simply part of the product in Windows Server 2008.
The validation process will check the many details and tell if anything is missing or configured improperly, pointing to the exact source of the issue. Validation will also do this for every server (node) of your future cluster, every network interface, every host bus adapter, every disk. It’s a thorough examination that includes simulating common cluster operations like contacting other nodes on every network interface or transferring control of a disk resource from one node to another.
Because validate is something you can perform multiple times, most of the stress is gone. Validate does not really configure anything, but it has to succeed before you create the cluster. So, by the time you need to actually do it, you already know it’s going to work.
The Validation Report
After you run validate, you’re given a report (in HTML) that diagnoses your system, indicating if there’s any trouble and exactly what that issue is. As I mentioned, you can run validate as many times as you need to get to desired result: a message that says “Testing has completed successfully and the configuration is suitable for clustering.” If don’t have a successful validation or if you deselect any of the tests, you’ll get a warning from the “Create a cluster” wizard saying your configuration is unsupported (and you wouldn’t want that).
Speaking of support, the report you get from validate is also a great tool to communicate with other groups (large companies typically delegate networking and storage to someone other than the server administration team), with the support staff from your hardware manufacturer and with Microsoft’s CSS (Customer Support Services). It’s also a great file to keep around as the documentation of your environment’s configuration.
By now you’re probably wondering what steps are included in the validation and what that report looks like. An actual report is too big to include here, but I got the titles for each portion of it, including the description of each one of the tests. Here it is:
Structure of a Validation Report
|1A. List BIOS Information||List BIOS information from each node.|
|1B. List Environment Variable||List environment variables set on each node.|
|1C. List Fibre Channel Host Bus Adapters||List Fibre Channel host bus adapters on each node.|
|1D. List iSCSI Host Bus Adapters||List iSCSI host bus adapters on each node.|
|1E. List Memory Information||List memory information for each node.|
|1F. List Operating System Information||List information about the operating system on each node.|
|1G. List Plug and Play Devices||List Plug and Play devices on each node.|
|1H. List Running Processes||List the running processes on each node.|
|1I. List SAS Host Bus Adapters||List Serial Attached SCSI (SAS) host bus adapters on each node.|
|1J. List Services Information||List information about the services running on each node.|
|1K. List Software Updates||List software updates that have been applied on each node.|
|1L. List System Drivers||List the system drivers on each node.|
|1M. List System Information||List system information such as computer model and domain.|
|1N. List Unsigned Drivers||List the unsigned drivers on each node.|
|2A. Validate Cluster Network Configuration||Validate the cluster networks that would be created for these servers.
Verify that each cluster network interface within a cluster network is configured with the same IP subnets.
Verify that, for each cluster network, all adapters are consistently configured with either DHCP or static IP addresses.
|2B. Validate IP Configuration||Validate that IP addresses are unique and subnets configured correctly.
Verify that a node does not have multiple adapters connected to the same subnet.
Verify that each node has at least one adapter with a defined default gateway.
Verify that there are no node adapters with the same EUI-48 physical address.
Verify that there are no duplicate IP addresses between any pair of nodes.
Check that nodes are consistently configured with IPv4 and/or IPv6 addresses.
|2C. Validate Network Communication||Validate that servers can communicate, with acceptable latency, on all networks.
Analyze connectivity results.
|2D. Validate Windows Firewall Configuration||Validate that the Windows Firewall is properly configured to allow failover cluster network communication.|
|3A. List All Disks||List all disks visible to one or more nodes (including non-cluster disks)|
|3B. List Potential Cluster Disks||List disks visible to all nodes that will be validated for cluster compatibility. Online clustered disks will be excluded.|
|3C. Validate Disk Access Latency||Validate acceptable latency for disk read and write operations.|
|Validating read latency of cluster disks||Validating write latency of cluster disks|
|3D. Validate Disk Arbitration||Validate that a node that owns a disk retains ownership after disk arbitration.|
|3E. Validate Disk Failover||Validate that a disk can fail over successfully with data intact.|
|3F. Validate File System||Validate that the file system on disks in shared storage is supported by failover clusters.|
|3G. Validate Microsoft MPIO-based disks||Validate that Microsoft MPIO-based disks have been configured correctly.|
|3H. Validate Multiple Arbitration||Validate that in a multiple-node arbitration process, only one node obtains control.|
|3I. Validate SCSI device Vital Product Data (VPD)||Validate that storage supports necessary inquiry data (SCSI page 83h VPD descriptors) and that they are unique.
Validate that for each cluster disk supporting SCSI page 83h VPD descriptors, all nodes return the same descriptors.
Validate that for each cluster disk supporting SCSI page 83h VPD descriptors, the descriptors are globally unique.
|3J. Validate SCSI-3 Persistent Reservation||Validate that storage supports the SCSI-3 Persistent Reservation commands.|
|3K. Validate Simultaneous Failover||Validate that disks can fail over simultaneously with data intact.|
|4. System Configuration|
|4A. Validate Active Directory Configuration||Validate that Active Directory is configured properly.
Validate that all nodes have the same domain, domain role, and organizational unit.
Validate that the user has the required permissions in Active Directory.
|4B. Validate All Drivers Signed||Validate that tested servers contain only signed drivers.|
|4C. Validate Operating System Versions||Validate that the operating systems on the servers support clustering and will interoperate.
Validate that all servers have the same operating system version.
Validate that all servers have valid operating system product suite types.
|4D. Validate Required Services||Validate that services required for failover clustering are running and configured properly:
- RPCSs (Remote Procedure Call)
- RemoteRegistry (Remote Registry)
- Lanmanserver (Server)
- WinMgmt (Windows Management Instrumentation)
|4E. Validate Same Processor Architecture||Validate that all servers have the same processor architecture.|
|4F. Validate Service Pack Levels||Validate that all servers with same operating system also have same service packs.|
|4G. Validate Software Update Levels||Validate that all tested servers have the same software updates installed.|
If you work with clusters and its many moving parts, you will eventually run into hardware and/or configuration issues. If you’re using Windows Server 2008 to run those, you might at times hate Validate for pointing out all your missteps :-). For instance, I copied some virtual machine configuration files once and this led to MAC address conflicts on my nodes. I just could not pass validation and got really mad at the tool. I eventually found a way to fix it...
In any case, you should never doubt that Validate is really your friend. After all, it will tell you about your mistakes and actually help you get through it. Isn’t that what real friends are supposed to do? And, best of all, Validate will be there to share the joy of confirming that everything is finally perfectly in place.
P.S.: If that got you interested in trying Failover Clustering for yourself, check this step-by-step guide at http://technet2.microsoft.com/windowsserver2008/en/library/adbf1eb3-a225-4344-9086-115a9389a2691033.mspx or use the Windows Server 2008 Failover Clustering Virtual Lab at http://msevents.microsoft.com/CUI/WebCastEventDetails.aspx?EventID=1032345932