Exchange Tools vs. Cluster Tools: What’s the Deal?


Note: Since authoring this blog, Microsoft Customer Support Services has identified a scenario in which moving a CMS using the cluster tools can leave databases in a dismounted or offline state. During regular operation, the Microsoft Exchange Replication service maintains a “mount allowed” value for each server within the CCR cluster via a private property of the clustered Information Store resource. When the Exchange tools are used to move resources between nodes, the Microsoft Exchange Replication service performs health checks to determine if the mount allowed value can be set to true. When moving resources with the cluster management tools, the Microsoft Exchange Replication service might not immediately become aware that a resource move has occurred. This can result in some or all of the databases hosted by the CMS in an offline or failed state. This can happen randomly, or not at all. Using the Exchange tools to move a CMS allows for notifications to be sent directly to the Microsoft Exchange Replication service, thereby preventing this issue from occurring. If you should happen to encounter this situation, we recommend you use the Exchange tools to move your CMS between nodes, and not the cluster management tools.

There is some confusion and misperceptions about the correct tools to use to manage a clustered mailbox server (CMS) in a single copy cluster (SCC) or cluster continuous replication (CCR) environment. Specifically, folks are asking the following questions:

  1. Do I use the Windows Server failover cluster management tools (namely, Cluster Administrator and Cluster.exe) to manage a CMS, or do you use the Exchange tools (namely, the Exchange Management Console and the Exchange Management Shell)?
  2. Will anything bad happen if I use the wrong tool to perform a task on my CMS?

To clear up this confusion, I thought a blog post might be helpful.

The answer to the second question is generally, no, because the answer to the first question is generally, there is no wrong tool. Instead, use of the tools really falls into two categories:

  1. You have to use tool X because there’s no choice. This category refers to the tasks that must be performed with the cluster tools.
  2. You should use tool X instead of tool Y because it’s better. This category refers to tasks performed by Exchange tools that do extra things that are good for you that the cluster tools don’t do.

Let’s look at what we say in the documentation which speaks to the second category, where we recommend using Exchange tools instead of Cluster tools, and the reasons why we recommend them.

In the content related to CCR, we say the following:

Cluster Administrator and Cluster.exe should not be used because:

 

  • These methods do not validate the health or state of the passive copy. Thus, their use can result in an extended outage while the node performs the operations necessary to make the database mountable.
  • These methods may also leave a database offline indefinitely because the replication is in a broken condition.

In the content related to SCC, we say the following:

Cluster Administrator and Cluster.exe provide mechanisms for moving resource groups between nodes in a cluster. When performing a handoff of a clustered mailbox server in a single copy cluster, we recommend that you use the Move-ClusteredMailboxServer cmdlet or the new Manage Clustered Mailbox Server Wizard in Exchange 2007 Service Pack 1 instead of the cluster management tools, because it enables the administrator to specify a reason for the handoff.

To be super clear, Microsoft fully supports the use of Cluster Administrator and Cluster.exe to manage clustered mailbox servers in Exchange 2007 RTM and SP1. But there are certain tasks for which we strongly recommend using the Exchange tools instead of the Cluster tools.

But then there’s the first category. There are other scenarios in which the Exchange tools are not available, or cannot be used because continuous replication is not healthy, or because Exchange has not yet been installed on all nodes in the cluster. In those cases, because the Exchange tools cannot be used, you have no choice but to use the Cluster tools.

Going back to the documentation once again, there are several places where we provide instructions that include the use of Cluster Administrator or Cluster.exe because you have to use those tools, such as:

These are just some examples; there are others.

One of the reasons we recommend using Exchange tools to manage clustered mailbox servers is that, while Exchange is cluster-aware, the cluster tools are not Exchange-aware. In a CCR environment, the Exchange tasks perform additional checks that the Cluster tools do not and cannot perform, such as checking the health and status of continuous replication. There is intelligence in the Exchange tasks that blocks the move if the task determines that, based on the current state of replication, a handoff would results in one or more databases not mounting. Once those additional checks are done, the Exchange tasks use the Cluster API to do cluster-related operations such as stopping a CMS, starting a CMS, or moving a CMS. In other words, the Exchange tasks do everything that the Cluster tools would do (using the same Cluster API), as well as some additional checks that the cluster tools can’t and don’t do.

For example, besides checking the state of replication, in both a CCR environment and in an SCC, the Exchange tasks also allow you to specify an administrative reason for a move or a stop, and it records that reason in the event log for audit and tracking purposes. The cluster tools also cause a move or a stop operation to be logged in the event log, but they do not allow you to specify a reason for the action.

I think some of the confusion around which tool to use is the result of having different tool guidance for different tasks that are associated with a single CMS, or with a specific environment (CCR or SCC). It just so happens that the guidance is the right tool for the right task and the right tool varies depending on what task you’re performing.

I also think some of the confusion may stem from the fact that, in previous versions of Exchange (for example, Exchange 2003), there weren’t any Exchange interfaces that were specific to clustered mailbox servers (what we used to call Exchange Virtual Servers (EVS)). In fact, it was just the opposite. When you installed Exchange 2003 in a cluster, it was not Setup that created the EVS, but instead it was the administrator manually creating a group with IP address, Network Name, and Physical Disk resources, and then creating a System Attendant resource, which in turn hydrated and created an EVS. This made the Setup experience for an EVS very different from the Setup experience for standalone Mailbox servers. This was a problem we squarely addressed in Exchange 2007.

One of the many goals we had for Exchange 2007 was to provide a rich Exchange-based management experience for all Exchange servers, including clustered mailbox servers. We wanted to get folks out of non-Exchange tools as much as possible, and allow them to use Exchange tools (the Exchange Management Console and the Exchange Management Shell) to manage Exchange objects (such as a clustered mailbox server). Previous versions of Exchange didn’t provide any interface to manage Exchange Virtual Servers; instead, administrators had to use Cluster tools, even when it came to creating the EVS in the first place. We did create a rich and robust management experience with the Exchange tools, but there are still a few tasks that remain outside of Exchange tools and in Cluster tools. But we do recommend that you use the Exchange tools when performing certain tasks, such as:

  • Moving a CMS between nodes. Move-ClusteredMailboxServer in RTM and SP1, and the new Manage Clustered Mailbox Server Wizard in SP1, remain the preferred tool for regular handoffs (scheduled outages). In the case of CCR, extra health checks are done, in the case of SCC, checks for an existing CMS are done, and in both CCR and SCC, you can log an administrative reason for the move. In cases where the tasks are not available, or by design won’t work because replication is not healthy, you can use Cluster Administrator or Cluster.exe to move a CMS between nodes.
  • Stopping a CMS. Stop-ClusteredMailboxServer in RTM and SP1, and the new Manage Clustered Mailbox Server Wizard in SP1, remain the preferred tool for stopping a CMS (taking it offline). In the case of SCC and CCR, these tools perform a faster offlining of the CMS, and they enable you to log an administrative reason for the stop.
  • Starting a CMS. Start-ClusteredMailboxServer in RTM and SP1, and the new Manage Clustered Mailbox Server Wizard in SP1, remain the preferred tool for starting a CMS (bringing it online), because it has extra Exchange logic in it that does things like look for the presence of a CMS that is already online on the same node.

At the beginning of this post, I said that nothing bad will happen if you use the wrong tool because there is no wrong tool. But let’s say you’ve ignored our recommendations to use the Exchange tools for some tasks, and instead you decide to use the Cluster tools. Will anything bad happen in this case? Again, generally, no. I say generally, because it depends on your definition of “bad.”

Here’s what I mean by this.

Using Cluster Administrator or Cluster.exe to take a CMS offline, to bring a CMS online, or to move a CMS between nodes will not cause any harm to the cluster, to the clusters configuration, to the CMS, to the CMS’ configuration, to any databases hosted on the CMS, or, to any other aspect of the cluster. So none of that bad stuff will happen.

But remember, if some of the Exchange checks that are performed only by the Exchange tools and not by the Cluster tools are skipped, then bad things might result from the action. For example, in a CCR environment, Move-ClusteredMailboxServer checks to see if replication is healthy before performing the move. If replication is broken or too far behind, the Exchange task will block the move, and no change will happen in the cluster. Everything will remain online, with no interruption in service.

By contrast, the cluster group move function knows nothing about the state of replication, and will happily perform the handoff of the CMS, even if replication is broken or too far behind. After the handoff, any storage group for which continuous replication is not healthy will have an offline database, and will incur an interruption in service until the copy can be brought up to date, or until some other administrative action is taken that corrects the Failed copy status for the storage group(s). So that kind of bad stuff could happen. And that is why we recommend using Exchange tools for certain tasks; they provide a level of protection for you that the Cluster tools do not.

I hope this clears things up for everyone!

Scott Schnoll

Share this post :
Comments (10)
  1. Andy Goodwin says:

    Thanks very much for this article.  I needed some specific reasons to explain this to clients.

  2. spiridon says:

    what about the moving of the Cluster Group (Cluster IP Address, cluster Name and Majority Node Set)?

    If I have to move it as well as the CMS, what is the preferred tool to use, cluster admin?

  3. Scott says:

    Spiridon, to move the cluster group, or any other non-Exchange resource group, you must use the cluster tools.  The Exchange Management Tools only know how to move a CMS; they don’t know anything about any other groups that are in the cluster.

  4. Scott says:

    Spiridon, slight correction: to move the cluster group, or any other non-Exchange resource group, you must use the cluster tools, if the non-Exchange application that created the resource group does not provide its own tool.

  5. nobody4 says:

    Sounds like a pain but I am sure we (The Customer) will adjust. I really liked being able to do everything from CluAdmin. Without "Bad things Happening"…

  6. The samples you sited are for creating or managing a cluster, none of them are for moving a CMS.

    Using CluAdmin/Cluster.exe vs. EMC/EMS with SP1 require different security levels by default and may not be the same individual. Now we will have to give Cluster Administrator Exchagne rights to avoid "Bad things Happening"…

    Thankfully SQL and other Microsoft produts don’t have an issue with CluAdmin/Cluster.exe. Nor do they have any other way to manage them, K.I.S.S. in action.

  7. Scott says:

    Rodney, if you look inthe Exchange 2007 content library at http://technet.microsoft.com/en-us/library/bb124558.aspx, you’ll see several places where we document moving a CMS in a CCR environment or in an SCC.  See, for example, http://technet.microsoft.com/en-us/library/aa998282.aspx, and http://technet.microsoft.com/en-us/library/aa998816.aspx.) There are other examples in the content where we use Exchange tools or Cluster tools to manage a CMS, as well.

    We also document the permissions needed to perform each task we document, and you can find those permissions in the Before You Begin section of each topic.

  8. Russ Kaufmann says:

    Scott,

    I think that you are missing the point. The major issues are these, as I see them:

    1. Using EMS does not allow for very granular delegation (please correct me if I am wrong). So, for example, I can’t delegate the ability to move the cluster to an operations team without giving them other permissions that are not required for their job and thus violating one of the major principles of the generally accepted security principles.

    2. The Exchange team, not the cluster team, failed to provide for proper use of existing tools (i.e. cluadmin and cluster.exe) which do allow for proper delegation.

    Exchange Server 2007 is the only product that failed to proper leverage existing tools and processes when it comes to clustering.

  9. Michael Hysen says:

    I have to agree with Russ, the fact that we now have to go to two places to perform a task that should be integrated simply makes my suggestion to use CCR look stupid to the cluster team.  The comment I get back is "so Microsoft implements an application that does not support the clustering standard they have implemented?".  I think you need to look deeper and see why this is something that is going to be generally unpopular.  From my side, the main issue I have with this relates to patching and the need to automate it out of a system such as SMS.  This will require me to develop and support a stand alone proccess and checking just for Exchange, when every other MS cluster product continues to work the same way.  On top of that I can not afford to be opening security up.  When introducing a new technology like CCR to an organisation, this kinds of isses can make ife difficult.

  10. Scott Schnoll says:

    Hi Nobody, Rod, Russ, and Michael Hysen,

     

    Thanks very much for your additional comments and feedback.

     

    I understand the points being raised, and you are correct that I did not address the separation of administration
    scenario that you describe.  In my experience (and in the experience of many others whom I’ve asked about this), the separation of administration between the cluster and Exchange is very rare. 
    It sounds like your experience differs from mine. For customers I’ve worked with that run Exchange clusters, the clustered mailbox servers and the cluster itself is managed by the messaging team. Often there are cluster experts on staff both inside and
    outside of the messaging team (for example, on a Windows management team, or some other management/ops team), but these admins typically don’t perform tasks such as moving clustered mailbox servers, or taking databases offline and doing maintenance. Of course,
    there will always be exceptions, and I appreciate your pointing them out.

     

    From a management perspective, I would strongly argue though that an administrator who is responsible solely
    for the cluster and not for the clustered application should not be performing management tasks related to the clustered application. In other words, someone whose job it is to manage the only cluster should not be moving clustered mailbox servers. We did
    do a tremendous amount of work to improve the Exchange cluster experience, which was driven in large part by customer feedback. But, having functionality that allows a pure cluster admin without any Exchange permissions to fully manage a clustered mailbox
    server in Exchange 2007 or Exchange 2007 SP1 was not a goal of ours.

     

    If any readers believe this separation of cluster versus clustered application administration is an important
    scenario, I would like to hear more details about it, and I invite you to contact me offline to discuss this further. I’m particularly interested in where each line of separation is drawn, and why a non-Exchange administrator would be allowed to perform tasks
    that directly affect Exchange (such as causing a brief interruption in service by moving a clustered mailbox server between nodes). Obviously it is too late to change anything in SP1, but certainly your feedback is something to think about for future releases
    of Exchange Server.

     

    Nobody, you wrote that "I really liked being able to do everything from CluAdmin. Without "Bad things Happening"…".
    Let me reiterate that using CluAdmin to manage a CMS does not make bad things happen. The "bad things" I wrote about only apply to CCR environments; they do not apply to SCC. The "bad things" are not corruption, damage will not happen, and
    nothing bad is happening as a result of using the cluster tools. 
    The "bad things" I wrote about are bad things that have already happened. They are
    not the result of using CluAdmin.

     

    The reason we recommending using our tasks instead of the cluster tools is because they have logic built-in
    that does some extra checks before the handoff is done. If the tools detect that the passive is not in a good state, and as a result, databases won’t be mountable, it will block the move. 
    The Exchange Management tools provide you with an additional level of protection. The cluster tools do not. 
    It was never our intent to put this or any other intelligence into the cluster tools; rather it was just the opposite. Our intention was to get folks out of non-Exchange tools and into Exchange tools. And when you do use the Exchange
    tools, be aware that they are using the same Cluster API and making the same API calls that the cluster tools use. In other words, our tool is an Exchange management layer on top of the cluster tools/API.

     

    Rod, you wrote that "Thankfully SQL and other Microsoft produts (sic) don’t have an issue with CluAdmin/Cluster.exe.
    Nor do they have any other way to manage them."  I disagree with that, and so do the SQL Server folks I asked about this. 
    SQL Server and many other cluster-aware applications include application-specific tools for management purposes. 
    For example, you don’t create a SQL Server database using the cluster tools; you create one with the SQL tools. 
    You don’t configure SQL Server log shipping with the cluster tools; you use the SQL tools. Can you move a clustered SQL Server from one node to another using the cluster tools? 
    Yes!  Can you do that with an Exchange 2007 clustered mailbox server? 
    Yes!  Will it cause any harm to Exchange if you do? 
    No!  Could it result in downtime because of external factors? 
    Yes!  And this is true of nearly all clustered applications.

     

    When it comes to management tasks related to the cluster, such as removing a node from a SQL Server cluster, SQL
    Server documentation provides instruction on using the SQL tools to do this (
    http://msdn2.microsoft.com/en-us/library/ms191545.aspx). 
    They don’t provide instructions to do this using cluster tools. For some tasks, such as changing the IP address of a clustered SQL Server, they provide instructions using the cluster tools (
    http://msdn2.microsoft.com/en-us/library/ms190460.aspx),
    and not the SQL tools.

     

    The cluster doesn’t generally know anything about the health and state of an application running in the cluster.
    For example, take SQL Server mirroring in a cluster. Mirroring and clustering work independently of each other. Mirroring knows nothing about clusters, and clusters know nothing about mirroring, just like clusters don’t know anything about continuous replication.

     

    Russ, you wrote that you "can’t delegate the ability to move the cluster to an operations team without giving
    them other permissions that are not required for their job." I agree that any operations team that is responsible for managing a cluster that contains Exchange should have the proper permissions. 
    I think we just disagree on what those permissions should be. I understand that you and others think the administrator should have only permissions to the cluster, and not have Exchange permissions. 
    I disagree, but if you feel strongly about this, please do contact me offline so that I can better understand these scenarios and take this feedback back to the team for consideration in future releases.

     

    Rod and Russ, no doubt with your clustering experience it is natural to view an Exchange cluster from a cluster
    perspective.  What we tried to do with Exchange 2007 is abstract away the cluster as much as possible and make the experience of managing a clustered mailbox server not more like managing other clustered applications, but rather more
    like managing a standalone Mailbox server. We don’t think that Exchange administrators should have to be cluster experts in order to deploy and run Exchange in a cluster. In the case of CCR, I don’t mind saying that I think we did a fantastic job of minimizing
    the need for cluster knowledge, particularly in the area of hardware and storage configuration. Certainly some cluster experience will be helpful, as they still need to build a cluster before they can install Exchange, but we help them out there, too, by giving
    them complete step-by-step instructions on how to do this (in RTM using GUI interfaces, and in the forthcoming SP1 content, using both GUI and command-line interfaces).

     

    As we move beyond Windows Server 2003 and into Windows Server 2008, as we’re doing with SP1, it is even more
    important to abstract away the cluster because failover clusters have changed significantly in Windows Server 2008. In fact, there is what we call a "clean break" in the Cluster API in Windows 2008. 
    This means, among other things, that you can’t call Cluster API’s between client levels. This means that for administration purposes:

     

                
    Windows Vista/Windows 2008 can manage only Windows 2008 (and later) failover clusters

                
    Windows XP/Windows 2003 can manage only Windows 2003 and earlier failover clusters

     

    In addition, given the substantial changes to failover clustering in Windows 2008, namely in the core areas
    of security, storage, and networking, it makes even more sense to abstract away the cluster for Exchange administrators, particularly those who are clustering Exchange 2007 today. There’s a lot of new and great stuff in Windows 2008 failover clusters, but
    there’s also a lot that has changed with respect to the management interfaces, as well. For the Exchange administrator, without the benefit of Exchange tools (including Exchange Setup) that lessen the need to be a cluster expert, a new high learning curve
    would occur.  They would know one way to move a resource group using Cluster Administrator and they would have to learn the new way to move the group using the Failover Cluster Management tool in Windows 2008. As a result of the way in
    which we provide management tasks now, the Exchange administrator does not need to become a cluster expert in order to run Exchange 2007 SP1 on a Windows 2008 failover cluster. Instead, the Move-ClusteredMailboxServer task and the Manage Clustered Mailbox
    Server Wizard GUI, will look, act, and operate the same no matter what operating system you’re running. As a former, long-time messaging administrator in small, medium and very large organizations, I think this approach provides a lot of benefits for administrators.

     

    Michael, you wrote "the fact that we now have to go to two places to perform a task that should be integrated
    simply makes my suggestion to use CCR look stupid to the cluster team." You don’t need to go to two places; you can manage CCR using the Exchange Management tools. 
    You also wrote that you get comments from other like "so Microsoft implements an application that does not support the clustering standard they have implemented?" 
    This comment makes no sense, as we appropriate leverage the Cluster API when we interface with and manipulate resources in a cluster. 
    In fact, that is one of the core points I’ve been trying to make. The Exchange tools fully leverage the Cluster API. 
    We are not doing anything inside of a cluster that is in any way different from what the cluster tools do. 
    Rather, the Exchange tools do extra tasks outside the cluster, such as checking on the state of replication in a CCR environment before calling the Cluster APIs to do the move.

     

    CCR is new and different; it is not like a traditional Exchange cluster, and it is not like SCC. Thus, is
    requires new approaches to management. There is a bit more going on than your traditional Exchange cluster; namely, CCR uses log shipping and it does not use shared storage. These are very important differences. One of the reasons behind the naming of SCC
    is to emphasize there is a single copy of the data in the cluster, and to differentiate it from CCR, where you have two copies of the data. In a CCR environment, you have an active node that contains the production copy of the database – the one that users/clients
    are accessing. You also have a passive node, which contains a copy of the production database that is maintained and kept current through the use of log shipping and log replay. This second copy, the passive copy, is only useful if it is kept up-to-date.

     

    In an SCC, you only have one copy of each database, and that copy moves between nodes when you move the clustered
    mailbox server. In a CCR environment, each node has its own copy of each database. When the clustered mailbox server is moved from the active node to the passive node, it stops using one copy of the database and starts using the other copy of the database.
    If the other copy is no good (perhaps because replication is not up-to-date), then the clustered mailbox server will not be able to mount the database, and this will result in downtime.

     

    In a CCR environment, you should never move a clustered mailbox server between nodes without knowing the state
    and health of continuous replication. You need to know things are OK on the node that is about to take ownership because if they are not, then you don’t want to perform the move because it would result in downtime. To obviate the need for an administrator
    to have to manually check the state of replication before performing a move, we included checks in our Move-ClusteredMailboxServer task. Of course, an administrator could certainly run Get-StorageGroupCopyStatus, Get-ClusteredMailboxServerStatus, and (new
    in SP1) Test-ReplicationHealth, to determine the health and status of continuous replication. 
    And if everything checks out as healthy and up-to-date, they can use the cluster tools to move the clustered mailbox server to another node. But as you implied, Michael, why use two tasks to do this when you can use one? This is why we have Move-ClusteredMailboxServer. 
    It combines the health check with the same move that the cluster tools do.

     

    Hopefully this further clarifies the statements I made in my blog, and what we say in our documentation. 
    Please do feel free to contact me offline if you would like to discuss this further.

     

    And, thank you all very much for your feedback. It is much appreciated!

Comments are closed.

Skip to main content