New File Share Witness and Force Quorum Guidance for Exchange 2007 Clusters


In Exchange Server 2007 RTM and SP1, the Exchange team has published guidance for using a CNAME record in DNS as part of the provisioning process for the file share witness (FSW) component of a Majority Node Set (MNS) quorum on Windows Server 2003, and a Node and Share Majority quorum on Windows Server 2008. Specifically, we state the following in the documentation:

"We also recommend that you create a CNAME record in the Domain Name System (DNS) for the server hosting the share, instead of the actual server name. When creating the share for the file share witness, use the fully qualified domain name (FQDN) for the CNAME record instead of the server name because this practice assists with site resilience."

Upon review of this guidance, we have learned that its effects and success can be unpredictable in some environments. As a result, we decided to revisit this guidance, and after working closely with the Windows Cluster team on various site resilience scenarios, we have decided to revise our configuration guidance.

In summary, we no longer recommend using a CNAME record as part of the FSW provisioning process. Instead of using a CNAME record and changing the FQDN for the target host to point to a server with a replacement FSW, in a backup site activation process, or in the reactivation process for a primary site, we now recommend using the Cluster service's built in "force quorum" capabilities.

Background

To illustrate exactly how this guidance is changing, consider a topology in which a two-node cluster continuous replication (CCR) environment is deployed across two physical sites: a primary datacenter and a backup datacenter, as shown below.

In this example, the Active Directory Site named Redmond-Prod is stretched across two datacenters, Datacenter A, the production datacenter, and Datacenter B, a warm backup datacenter that has dedicated resources (global catalog, Client Access, Hub Transports) for the Redmond-Prod site. Datacenter B also contains a second Active Directory Site named Redmond-DR, which contains dedicated resources that can be moved from the Redmond-DR site to the Redmond-Prod site in Datacenter B.

Focusing just on the cluster implementation, we have a two-node CCR environment that during normal production hours has the clustered mailbox server (CMS) hosted on NodeA in Datacenter A, and NodeB is the passive node residing in Datacenter B.

The CCR environment is configured to use an MNS quorum with FSW. In this configuration, there are two options for the location of the file share used for the FSW: (1) locate the share on a server in Datacenter A, or (2) locate the share on a server in Datacenter B. The general recommendation in this configuration is to use a share on a Hub Transport server in Datacenter A that is in the same Active Directory Site as the CMS. In addition, to save time during the process of activating Datacenter B, we also recommend staging a replacement share for the FSW on a Hub Transport server in Datacenter B, as shown below.

If Datacenter A fails or is otherwise unavailable, and Datacenter B is to be activated, our original guidance called for quickly switching over to a replacement share for the FSW by changing the FQDN of the target host for the FSW CNAME record in DNS from that of the Hub Transport server in Datacenter A to the Hub Transport server in Datacenter B.

For example, say you have the following:

  • A CMS named EXMBX1 in a CCR environment in the Redmond-Prod Site located in Datacenter A.
  • EXHUB1 in the Redmond-Prod Site located in Datacenter A. It has been provision with a file share named FSW_EXMBX1, which will be used for the FSW for EXMBX1.
  • EXHUB2 in the Redmond-DR Site located in Datacenter B. It has been provision with a file share named FSW_EXMBX1, which will be used for the FSW for EXMBX1 when Datacenter B takes over for Datacenter A.

Instead of configuring the MNS quorum with FSW to use a path of \\EXHUB1\FSW_EXMBX1 for the witness share, our original guidance was to:

  1. Configure a CNAME record in DNS, using an alias of EXMBX1FSW.
  2. Configure the CNAME record to initially point to EXHUB1.
  3. Configure a file share on EXHUB1 and on EXHUB2 called FSW_EXMBX1.
  4. Configure the MNS quorum with FSW to point to \\EXMBX1FSW\FSW_EXMBX1.
  5. When Datacenter B is being activated, reconfigure the CNAME record to point to EXHUB2.

Why Change this Guidance?

While the practice of changing the FQDN of the target host for a CNAME record in DNS in order to redirect a Windows failover cluster configured with an MNS/FSW quorum to a replacement witness share does work, we no longer believe it to be the best option in these environments.

Effectively, the original guidance was a bit of a trick that was designed to "out-smart" the FSW quorum using DNS sleight of hand. While the trick works well and does not appear to have caused any known issues, we do know that larger, more complex DNS topologies, particularly those in which DNS replication or convergence can take a while, can make this trick prone to unpredictable behavior. And as you know, DNS issues for Active Directory, Exchange, or Windows can cause all sorts of problems.

New Guidance

The new guidance for these configurations is to use the built-in "force quorum" capabilities provided by the Cluster service to use a standby share on another Hub Transport server that is located in the datacenter that is being activated. This new guidance applies to both Windows Server 2003 and Windows Server 2008, and to both CCR and single copy clusters (SCC) that use a Majority Node Set with File Share Witness quorum (Windows 2003) or a Node and Share Majority quorum (Windows 2008).

Using the example topology above, the new guidance is as follows:

Windows Server 2003 Clusters

  1. Provision a new share for FSW on EXHUB2 in Datacenter B (if this is not already done).
  2. Force quorum on NodeB in Datacenter B by adding the following registry value to the node:
    HKLM\System\CurrentControlSet\Services\ClusSvc\Parameters
    REG_SZ: ForceQuorum
    Value: NodeB
  3. Start the Cluster service on NodeB in Datacenter B by running the following command: net start clussvc
  4. Reconfigure the cluster quorum to use the FSW share on EXHUB2.
  5. Take the clustered mailbox server offline.
  6. Remove the registry value, and then reboot NodeB.
  7. Start the clustered mailbox server.

To re-activate NodeA in Datacenter A:

  1. Provision a new share on EXHUB1 in Datacenter A.
  2. Bring NodeA online.
  3. Reconfigure the cluster quorum to use the FSW share on EXHUB1.
  4. Stop the clustered mailbox server.
  5. Move the Cluster Group from NodeB to NodeA.
  6. Move the clustered mailbox server from NodeB to NodeA.
  7. Start the clustered mailbox server.

Windows Server 2008 Clusters

  1. Provision a new share for FSW on EXHUB2 in Datacenter B (if this is not already done).
  2. Force quorum on NodeB in Datacenter B by running the following command on NodeB: net start clussvc /forcequorum
  3. Use the Configure Cluster Quorum Settings wizard to configure the Node and Share Majority to use the FSW share on EXHUB2.
  4. Reboot NodeB.
  5. Start the clustered mailbox server.

To re-activate NodeA in Datacenter A:

  1. Provision a new share on EXHUB1 in Datacenter A.
  2. Bring NodeA online.
  3. Use the Configure Cluster Quorum Settings wizard to configure the Node and Share Majority to use the FSW share on EXHUB1.
  4. Stop the clustered mailbox server.
  5. Move the Cluster Group from NodeB to NodeA.
  6. Move the clustered mailbox server from NodeB to NodeA.
  7. Start the clustered mailbox server.

Do I need to change my existing configuration?

This updated guidance is for new deployments. There is no urgency or recommendation to reconfigure existing deployments that are using the original guidance of changing a CNAME record in DNS for switching to a replacement file share for the FSW.

However, in the event that you do activate a backup datacenter, we do recommending decommissioning the use of the CNAME record and configuring the UNC path for the file share to use actual server names.

What About the TechNet Documentation?

We are in the process of updating several topics in the Exchange Server 2007 Library on TechNet (aka, the Exchange TechCenter). The documentation being updated with this new guidance is expected to be published with the May 08 documentation refresh, and it includes:

The following topics have been updated to remove the old guidance, and they are published with the April 08 documentation refresh:

- Scott Schnoll


Share this post :

Comments (11)
  1. iamme says:

    Either I’m an idiot (hopefully not) or those are some confusing directions:

    1. To re-activate NodeA in Datacenter A (Do we follow the same steps to do forcequorum registry change?

    2. It says to re-activate on Node A move cluster groups to Node A. But when we were working on Node B, we had to reboot which means they should already be on Node A.

    Confused!  At what point of time are these steps performed (one time thing or every time there’s a fail over to the secondary datacenter and which node holds the resources at any given time?

  2. bday says:

    Do you recommend this over having the FSW in a 3rd tiny inexpensive location?

  3. Remy says:

    Hello,

    is the MNS/FSW compatible with more than 2 nodes in w2008 clusters ?

    Thx

    Remy

  4. Scott says:

    Iamme, You do not need to forcequorum on NodeA.  You only need to forcequorum on NodeB to bring it up when it is the last man standing.  After NodeB is up, and a replacement share is provisioned and the cluster is using that share, you can then remove the registry value and reboot NodeB.  Then, bring up NodeA.

    Bday, You can choose to use a third datacenter for the FSW share; the real point here is that you no longer configure a CNAME record in DNS as part of FSW provisioning.

    Remy, the FSW model works only when there is precisely two nodes joined to the cluster.

  5. stephen says:

    Hi there,

    Firstly, sorry for a long post.  I think I’m getting confused (or confusing myself) and I’m pretty sure it’s because I’m not fully understanding how MNS+FSW works (especially when using ForceQuorum).  I have added comments on the steps I’m having troubles getting my head around.  I would greatly appreciate if you can confirm my thoughts, or point out where I’m wrong:

    Provisioning Site B:

    2) Force quorum on NodeB in Datacenter B by adding the following registry value to the node:

    HKLMSystemCurrentControlSetServicesClusSvcParameters

    REG_SZ: ForceQuorum

    Value: NodeB

    – Does this tell the cluster service to start with a new quorum, and to start in a configuration so that NodeB is the only node in the Cluster?

    4) Reconfigure the cluster quorum to use the FSW share on EXHUB2.

    – Why do we need to do this step?  If the Cluster service has started and the Clustered Mailbox Server is online (which I’m assuming it is, as the next step is to take it offline), couldn’t we just run without a FSW until we re-activate SiteA?

    5) Take the clustered mailbox server offline.

    6) Remove the registry value, and then reboot NodeB.

    7) Start the clustered mailbox server.

    Re-Activating siteA

    1) Provision a new share on EXHUB1 in Datacenter A.

    Is this to ensure that the old share (containing the old quorum data) is not available to NodeA when it starts (and therefore can not form a majority with it)?  Would it be possible to just delete the contents of the existing FSW Share, or is it a requiremnt to create a new share (new share name?)?

    2) Bring NodeA online.

    How does the cluster service start in this situation?  Is it because it can still form majority because it can communicate with both itself and NodeB – Basically it just thinks that the EXHUB1 is down (as the Share was recreated or the contents were deleted in step 1)?  

    The rest of the steps seem fairly self explanatory.  Any clarification or further explanation on the above would be most appreciated.

    3) Reconfigure the cluster quorum to use the FSW share on EXHUB1.

    4) Stop the clustered mailbox server.

    5) Move the Cluster Group from NodeB to NodeA.

    6) Move the clustered mailbox server from NodeB to NodeA.

    7) Start the clustered mailbox server.

    Many thanks,

    Stephen.

  6. systemnt says:

    Is this new guidance only for STRETCHED clusters, or would this apply to regualar same site clusters as well?

    ie: Active and Passive nodes on same network.

  7. iamme says:

    And one more thing, just brought up a Server 2008 Cluster.  Exchange is not installed on it yet.  I don’t see anywhere in the GUI where it allows us to move the Cluster Group.  I want to test failover to the second node without having to shut down the server or disable NICs.  How can I do this from the GUI.  I know how to do it via the CLI.

    I know that once Exchange is installed, we use move-clusteredmailboxserver.  But I’d like to know how to move the default cluster group via the GUI.

    Thanks

  8. Silis says:

    It´s confusing. Please be more explicit in the steps.

  9. Brian says:

    And what about using folder replication to host an FSW such as what AD uses to replicate GPO’s? Has anybody tried this?

    Another option would be to setup a cheap cluster outside of Exchange 2007 hosting a name, ip, and file share resource. The file share could be your FSW and could be redundant as well.

  10. Mike Crowley says:

    What is the quorum guidance in a multi-subnet Exchange 2007 cluster that has more than two nodes (SCC)?

  11. Exchange says:

    Mike, the guidance would be to use an MNS quorum (without a file share witness, as FSW requires precisely two nodes).

Comments are closed.

Skip to main content