Exchange 2007 Cluster Setup - Did it complete?

As you no doubt know by now, Exchange 2007 introduces different types of clustering.  It still retains the traditional Single Copy cluster where all data is stored in a central location (i.e. SAN).  It also introduces the concept of Cluster Continuous Replication, or CCR, where centralized storage is not required, rather each server contains a separate copy of the data.  This is achieved by an initial "seeding", and is then maintained by replication of log files.  Service Pack 1 for Exchange 2007 includes yet a third type of cluster a new technology for managing server failures, called Standby Continuous Replication, or SCR, but we'll leave that for another post.

[edit] as was pointed out to me, SCR isn't actually a clustered solution, as it does not use Windows clustering.  I've thus modified the original post to reflect that.  However, it can be used as the target for an Exchange 2007 cluster.  More details to come when I post about SCR.

For today, I've installed a new CCR cluster, but I didn't allow it to complete.  Let's have a look at the setup log.  This is located in the c:\ExchangeSetupLogs directory.  The one that contains the most information is ExchangeSetup.log.  Since this log file contains a lot of information, I'll only post part of it.

 

[8/3/2007 4:46:12 PM] [0] Starting Microsoft Exchange 2007 Setup
[8/3/2007 4:46:12 PM] [0] **********************************************
[8/3/2007 4:46:12 PM] [0] Setup version: 8.0.685.24.
[8/3/2007 4:46:12 PM] [0] Logged on user: E12CCR\Administrator.
[8/3/2007 4:46:12 PM] [0] Command Line Parameter Name='mode', Value='Install'.
[8/3/2007 4:46:12 PM] [0] Command Line Parameter Name='sourcedir', Value='D:\i386'.
[8/3/2007 4:46:12 PM] [0] Command Line Parameter Name='fromsetup', Value=''.
[8/3/2007 4:46:12 PM] [0] ExSetupUI was started with the following command: '-mode:install -sourcedir:D:\i386 /FromSetup'.
[8/3/2007 4:46:17 PM] [0] Setup is choosing the domain controller to use
[8/3/2007 4:46:18 PM] [0] Setup is choosing a local domain controller...
[8/3/2007 4:46:20 PM] [0] Setup has chosen the local domain controller e12dc.E12CCR.com for initial queries
[8/3/2007 4:46:21 PM] [0] PrepareAD has been run, and has replicated to this domain controller; so setup will use e12dc.E12CCR.com
[8/3/2007 4:46:21 PM] [0] Setup is choosing a global catalog...
[8/3/2007 4:46:21 PM] [0] Setup has chosen the global catalog server e12dc.E12CCR.com.
[8/3/2007 4:46:21 PM] [0] Setup will use the domain controller 'e12dc.E12CCR.com'.
[8/3/2007 4:46:21 PM] [0] Setup will use the global catalog 'e12dc.E12CCR.com'.
[8/3/2007 4:46:21 PM] [0] Exchange configuration container for the organization is 'CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=E12CCR,DC=com'.
[8/3/2007 4:46:21 PM] [0] Exchange organization container for the organization is 'CN=First Organization,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=E12CCR,DC=com'.
[8/3/2007 4:46:21 PM] [0] This machine is part of a Windows failover cluster.
[8/3/2007 4:46:21 PM] [0] This server is not an active node for any Clustered Mailbox servers.
[8/3/2007 4:46:21 PM] [0] Setup will search for an Exchange Server object for the local machine with name 'E12CCR2-N1'.
[8/3/2007 4:46:21 PM] [0] No Exchange Server with identity 'E12CCR2-N1' was found.
[8/3/2007 4:46:36 PM] [0] The following roles are unpacked:
[8/3/2007 4:46:36 PM] [0] The following roles are installed:
[8/3/2007 4:46:36 PM] [0] The local server does not have any Exchange files installed.
[8/3/2007 4:46:36 PM] [0] Setup will use the path 'D:\i386' for installing Exchange.
[8/3/2007 4:46:36 PM] [0] The server is cluster type: 'None'.
[8/3/2007 4:46:36 PM] [0] The requested cluster type: 'None'.
[8/3/2007 4:46:36 PM] [0] The installation mode is set to: 'Install'.
[8/3/2007 4:46:36 PM] [0] An Exchange organization with name 'First Organization' was found in this forest.
[8/3/2007 4:46:36 PM] [0] Active Directory Initialization status : 'True'.
[8/3/2007 4:46:36 PM] [0] Schema Update Required Status : 'False'.
[8/3/2007 4:46:36 PM] [0] Organization Configuration Update Required Status : 'False'.
[8/3/2007 4:46:36 PM] [0] Domain Configuration Update Required Status : 'False'.
[8/3/2007 4:46:37 PM] [0] Applying default role selection state

This simply shows an overview of the setup choices that have been made.

[8/3/2007 4:52:39 PM] [1] Setup launched task 'test-setuphealth -DomainController 'e12dc.E12CCR.com' -DownloadConfigurationUpdates $false -ExchangeVersion '8.0.685.24' -Roles 'ClusterMailbox' -ScanType 'PrecheckInstall' -SetupRoles 'AdminTools','Mailbox','ClusterMailbox' -CmsName 'CCR-EVS2' -CmsDataPath 'E:\Exchsrvr\MDBData' -CmsIPAddress '10.10.201.230' -CmsSharedStorage $false -CreatePublicDB $false'

This is showing the pre-requisite check tests that are being performed, and shows that the setup roles that have been selected are AdminTools, Mailbox, ClusterMailbox.  Most of the rest of the log file just shows the progress of each role, so I won't bore you with that.  Let's get on to the interesting part.  Once the mailbox role finishes installing, setup runs the new-ClusteredMailboxServer to create the actual cluster.  One nice thing is that you don't have to worry about creating the Exchange group in Cluster Admin as you had to do with Exchange 2003.  Setup now creates it for you automatically.  Ok - here is the failure.  Note the section in Bold.

[8/6/2007 5:36:17 PM] [1] Processing component 'Mailbox System Attendant Dependent Tasks' (Configuring tasks dependent on System Attendant service).
[8/6/2007 5:36:17 PM] [1] Executing 'start-ClusteredMailboxServer -Identity:$RoleName', handleError = False
[8/6/2007 5:36:17 PM] [2] Launching sub-task '$error.Clear(); start-ClusteredMailboxServer -Identity:$RoleName'.
[8/6/2007 5:36:18 PM] [2] Beginning processing.
[8/6/2007 5:36:18 PM] [2] Administrator Active Directory session settings are: View Entire Forest: 'True', Configuration Domain Controller: 'e12dc.E12CCR.com', Preferred Global Catalog: 'e12dc.E12CCR.com', Preferred Domain Controllers: '{ e12dc.E12CCR.com }'
[8/6/2007 5:36:18 PM] [2] Searching objects "ccr-evs2" of type "Server" under the root "$null".
[8/6/2007 5:36:18 PM] [2] Previous operation run on domain controller 'e12dc.E12CCR.com'.
[8/6/2007 5:36:18 PM] [2] Start-ClusteredMailboxServer is trying to start clustered mailbox server ccr-evs2.
[8/6/2007 5:37:05 PM] [2] Start-ClusteredMailboxServer finished starting (bringing online) clustered mailbox server ccr-evs2.
[8/6/2007 5:37:05 PM] [2] [ERROR] Unexpected Error
[8/6/2007 5:37:05 PM] [2] [ERROR] Clustered mailbox server 'ccr-evs2' is not in a started (online) state (Failed). The cluster resource 'First Storage Group/Mailbox Database (ccr-evs2)' is in state (Failed).

[8/6/2007 5:37:05 PM] [2] Ending processing.
[8/6/2007 5:37:05 PM] [1] The following 1 error(s) occurred during task execution:
[8/6/2007 5:37:05 PM] [1] 0. ErrorRecord: Clustered mailbox server 'ccr-evs2' is not in a started (online) state (Failed). The cluster resource 'First Storage Group/Mailbox Database (ccr-evs2)' is in state (Failed).
[8/6/2007 5:37:05 PM] [1] 0. ErrorRecord: Microsoft.Exchange.Management.Tasks.NewCmsNotOnline: Clustered mailbox server 'ccr-evs2' is not in a started (online) state (Failed). The cluster resource 'First Storage Group/Mailbox Database (ccr-evs2)' is in state (Failed).
[8/6/2007 5:37:05 PM] [1] [ERROR] Clustered mailbox server 'ccr-evs2' is not in a started (online) state (Failed). The cluster resource 'First Storage Group/Mailbox Database (ccr-evs2)' is in state (Failed).
[8/6/2007 5:37:05 PM] [1] Setup is halting task execution because of one or more errors in a critical task.
[8/6/2007 5:37:05 PM] [1] Finished executing component tasks.
[8/6/2007 5:37:05 PM] [1] Ending processing.
[8/6/2007 5:40:48 PM] [0] End of Setup

The Setup UI will now show that setup has completed with an error.  Ok, so let's re-run setup to allow it to complete.  What's that?  The Setup UI only allows you to Add or Remove roles.  Oh yeah - since this was set up as a Cluster, you can't have any other roles on it, and you can't de-select the Active Cluster role.  Now at this point, I can actually go in to Cluster Admin, and verify that all the resources are online.  In the above case where my First Storage Group/Mailbox Databases (ccr-evs2) was in a failed state, I simply brought it online, and it came online just fine.  Since all the resources are online, you might think that everything is hunky dory.  Not so, my friend.  We've got a BIG problem.  The System Attendant Object was never fully provisioned.  If you leave your cluster running in the above state, you'll see lots of issues related to many tasks that the System Attendant Mailbox is used for.  In my above setup, if we look at the System Attendant object using ADSIEdit, we will find that there are no proxy addresses (BAD BAD BAD), and msExchPoliciesIncluded is not set.  This will cause massive problems with Free/Busy, and you'll see NDR's being generated for any messages sent to your System Attendant mailbox (duh - it doesn't have an e-mail address)

OK - so how do you fix it?  Well, before we get there, there is one more thing you need to look at that will tell you if setup encountered any problems that need to be fixed.  Open up the registry, and navigate to

HKEY_LOCAL_MACHINE\Software\Microsoft\Exchange\v8.0

and see which subkeys exist.  You should see a separate key for Each role that is installed.  For the above cluster, if setup had completed successfully, you would only see the following keys.

v8.0
   AdminTools
   MailboxRole
   Setup

In my case, however, I saw one more (bolded)

v8.0
   AdminTools
   ClusteredMailboxServer
   MailboxRole
   Setup

Viewing that key showed a Value with a name of "WaterMark" that contained a hex string.  This Watermark is very important!  It is how setup notes where it left off, and what still needs to be completed.

Ok - now on to the resolution.  If you experience a failure during setup (clustered or not), the best way to resolve it in almost all cases is to run setup from the command prompt.  The command-line version of setup is called from setup.com.  For my cluster issue, since the Mailbox role completed, I don't have to actually run setup and select a role to install, rather I need to re-setup the cluster.  The command I used was

setup.com /newcms /cmsname:ccr-evs2 /cmsipaddress:10.10.21.230 /cmsdatapath:e:\exchsrvr\mdbdata

When this command is run on an existing cluster with this same name, all setup will do is determine if there are any actions that have not been completed.  In other words, it checks for a Watermark.  In my case, it picked up at about 75% or so, and ran through to completion successfully.  Inspecting the setup log after completion shows the following tasks (which I'd consider fairly important!)

[8/6/2007 5:51:55 PM] [1] Executing 'enable-SystemAttendantMailbox -Identity:$RoleFqdnOrName -DomainController $RoleDomainController', handleError = True
[8/6/2007 5:51:55 PM] [2] Launching sub-task '$error.Clear(); enable-SystemAttendantMailbox -Identity:$RoleFqdnOrName -DomainController $RoleDomainController'.
[8/6/2007 5:51:55 PM] [2] Beginning processing.
[8/6/2007 5:51:55 PM] [2] Searching objects "ccr-evs2" of type "ADSystemAttendantMailbox" under the root "First Organization".
[8/6/2007 5:51:55 PM] [2] Previous operation run on domain controller 'e12dc.E12CCR.com'.
[8/6/2007 5:51:55 PM] [2] Processing object "Microsoft System Attendant".
[8/6/2007 5:51:55 PM] [2] Applying RUS policy to the given recipient "Microsoft System Attendant" with the home domain controller "e12dc.E12CCR.com".
[8/6/2007 5:52:17 PM] [2] The RUS server that will apply policies on the specified recipient is "CCR-EVS2.E12CCR.com".

[8/6/2007 5:52:17 PM] [2] Saving object "Microsoft System Attendant" of type "ADSystemAttendantMailbox" and state "Changed".
[8/6/2007 5:52:17 PM] [2] Previous operation run on domain controller 'e12dc.E12CCR.com'.
[8/6/2007 5:52:17 PM] [2] Ending processing.

If you aren't installing a cluster, you may still see a Watermark value for one of the other roles (ClientAccess, HubTransport, Mailbox, UnifiedMessaging, etc.

If you have a watermark for one of those roles, you need to note which role has a watermark, then run the following

setup.com /roles:mb (or insert the name of the role that shows the watermark)

This will allow setup to process that role, and based on the watermark in the registry, it will continue where it left off.  Once setup completes successfully, the watermark value will be removed.

I'm hoping and expecting that Service Pack 1 for Exchange 2007 will be able to handle issues like this one much better, as it should include a "Reinstall" option.  If not, you can always fall back to the trusty command-line options.