New High Availability Features in Exchange 2010 SP1

As you may read by now, we've started to release the first details on Exchange 2010 SP1. In addition to the features mentioned in the Exchange Team blog post, there's a lot of other goodness in there, too. In this post, I describe the new high availability features in Exchange 2010 SP1, and there are some pretty awesome features.

But a quick note: everything in this post is based on pre-release software and preliminary information that is subject to change. These are things we are working on or are about to work on. The feature names, behaviors and descriptions used below might not be the final names, behaviors and descriptions. The behvaiors described may or may not make it into the final shipping version of SP1 or a future version of the product. Standard disclaimers apply regarding pre-Beta software and content.

EDITED December 1, 2010 to remove reference to cross-site feature that did not make it into SP1.

In the Exchange team blog post, and in my previous post on SP1, you read about the new UI enhancements for DAGs in the Exchange Management Console. These enhancements complete the GUI experience for DAGs. In the RTM version of Exchange 2010, there is GUI to create the DAG, to manage DAG membership, and to manage DAG networks. But the GUI assumed the use of DHCP for the DAG IP address(es). If you wanted use one or more static IP addresses for your DAG, you had to use the Set-DatabaseAvailabilityGroup cmdlet in the Exchange Management Shell. In SP1, you can do all IP addressing for your DAG in the GUI. I'm proud to say that I had a hand in making this change happen. :-)

In addition, you can now use the DAG Properties GUI to manage the DAG's alternate witness server and directory settings. In RTM, only the DAG's witness server and directory could be managed through the GUI. If you wanted to configure an alternate witness server and alternate witness directory in RTM, you had to use the Exchange Management Shell.

Ok, so on to the new stuff we hope to have for high availability in Exchange 2010 SP1. The following new features for high availability and improvements to existing high availability features are available in SP1:

  • Continuous replication - block mode
  • Active mailbox database redistribution
  • Improved Outlook cross-site connection behavior and experience
  • Enhanced datacenter activation coordination support
  • New and enhanced management and monitoring scripts
  • Improvements in failover performance

These features are discussed in greater detail below.

Continuous Replication - Block Mode

In the RTM version of Exchange 2010 and in all versions of Exchange 2007, continuous replication operates by shipping copies of the log files generated by the active database copy to the passive database copies. Beginning with SP1, this form of continuous replication is known as continuous replication - file mode. SP1 also introduces a new form of continuous replication known as continuous replication - block mode. In block mode, as each update is written to the active database copy's active log file it is also shipped to the passive mailbox copies. In the event of a failure affecting the active copy, the passive copies will have been updated with most or all of the latest updates. The active does not wait for replication to complete in order to preclude replication issues from affecting the client experience. Continuous replication - block mode is only active when continuous replication is up-to-date in file mode. The transition into and out of block mode is performed automatically by the log copier. Block mode dramatically reduces the latency between the time a change is made on the active copy and when the change is replicated to a passive copy. In addition to replicating individual log file writes, block mode also changes the activation process for a passive copy. If a copy is in block mode when a failure occurs, the system uses whatever partial log content is available during the activation process.

Active Mailbox Database Redistribution

This feature is present in two forms. The first form is a script that can be periodically run by administrators to balance the distribution of active database copies across a database availability group (DAG). The second form we hope to implement is the addition of copy distribution awareness to Active Manager's best copy selection (BCS) process.

Enhanced datacenter activation coordination support

Exchange 2010 RTM includes a special mode for DAG site resilience support called datacenter activation coordination (DAC) mode. In DAC mode, Exchange cmdlets can be used to perform a datacenter switchover. In the RTM version, DAC mode is limited to DAGs with at least three members that have at least two or more members in the primary datacenter.

In SP1, DAC mode has been extended to support two-member DAGs that have each member in a separate datacenter. DAC mode support for two-member DAGs leverages the witness server to provide additional arbitration. In addition, DAC mode has been extended to support DAGs that have all members deployed in a single Active Directory site, including Active Directory sites that have been extended to multiple locations.

So basically in SP1, you can now use DAC mode for all DAGs with two or more members.

New and Enhanced Management and Monitoring Scripts

SP1 includes several new and enhanced scripts that greatly improve the management and monitoring experience. The following scripts are included in SP1:

  • CheckDatabaseRedundancy.ps1 (new) - This script is used to check the redundancy of replicated databases, and it will generate events if database resiliency is found to be in a compromised state (e.g., you are down to a single copy of a replicated database). This is accompanied by a System Center Operations Manager management pack change that can be used to monitor for databases without redundancy, which is particularly useful in an environment with JBOD.
  • StartDagServerMaintenance.ps1 and StopDagServerMaintenance.ps1 (new) - This script is used to take a DAG member of service for maintenance. It will move active databases off of the server and block databases from moving to that server. It will also make sure all critical DAG support functionality (e.g., the PAM role) that might be on the server is moved to another server, and blocked from moving back to the server. A second script (StopDagServerMaintenance) is provided to complete the operation and remove the blocks.
  • CollectOverMetrics.ps1 (enhanced)
  • CollectReplicationMetrics.ps1 (enhanced)

Improvements in Failover Performance

We're also looking at including changes targeted at improving failover and switchover performance and behavior in SP1. Other changes are targeted and tuning timeouts and other algorithmic details to improve failover performance, as well as I/O performance after failovers.