Database Availability Groups

A database availability group (DAG) is a set of up to 16 Exchange Server 2010 Mailbox servers that provide automatic database-level recovery from a database, server, or network failure.  DAGs use continuous replication and a subset of Microsoft Windows failover clustering technologies to provide continuous mailbox availability.  Mailbox servers in a DAG monitor each other for failures.  When a Mailbox server is added to a DAG, it works with the other servers in the DAG to provide automatic, database-level recovery from database failures.

When we create a DAG, it will initially be empty, and a directory object is created in Active Directory Domain Services (AD DS) that represents the DAG.  The directory object is used to store relevant information about the DAG, such as server membership information.  When an administrator adds the first server to a DAG, a failover cluster is automatically created for the DAG.  In addition, the infrastructure that monitors the servers for network or server failures is initiated.  The failover cluster heartbeat mechanism and cluster database are then used to track and manage information about the DAG that can change quickly, such as database mount status, replication status, and last mounted location.

Database Availability Group Design & Cluster Continuous Replication Design

Exchange Server 2010 uses the same continuous replication technology found in Exchange Server 2007.  However, Exchange Server 2010 combines on-site data replication (CCR) and off-site data replication (SCR) into a single Structure known as a DAG.  Once servers have been added to a DAG, administrators can add replicated database copies (up to 16 in total), and Exchange Server 2010 switches between these copies automatically, as needed, to maintain availability.

This new high availability architecture also provides simplified recovery from a variety of failures (disk-level, server-level, and datacenter-level), and it can be deployed on a variety of storage types.

Architectural Changes to Continuous Replication from Exchange Server 2007

Exchange Server 2007 introduced a built-in data replication technology called continuous replication.  Continuous replication, which was available in three forms: LCR, CCR, and SCR, significantly reduced the cost of deploying a highly available Exchange infrastructure, and provided a much improved deployment and management experience over previous versions of Exchange.  Even with these cost savings and improvements, however, running a highly available Exchange Server 2007 infrastructure still required a great deal of time and expertise because the integration between Exchange and Windows failover clustering was not seamless.  In addition, customers wanted an easier way to replicate their e-mail data to a remote location, in order protect their Exchange environment against site-level disasters.

Unlike Exchange Server 2007, where clustered mailbox servers required dedicated hardware, Mailbox servers in a DAG can host other Exchange roles (Client Access, Hub Transport, Unified Messaging ), providing full redundancy of Exchange services and data with just two servers.

The underlying continuous replication technology previously found in CCR and SCR remains in Exchange Server 2010 and it has been further developed to support new high availability features such as database copies, database mobility, and database availability groups.  Some of these new architectural changes are briefly described below:

  • Since storage groups have been removed from Exchange Server 2010, continuous replication now operates at the database level.  Exchange Server 2010 still uses an Extensible Storage Engine (ESE) database that produces transaction logs which are replicated to one or more other locations and replayed into one or more copies of a mailbox database.
  • Log shipping and seeding no longer uses Server Message Block (SMB) for data transfer.  Exchange Server 2010 continuous replication uses a single administrator-defined TCP port for data transfer.  In addition, Exchange Server 2010 includes built-in options for network encryption and compression for the data stream.
  • Database copies are for mailbox databases only.  For redundancy and high availability of public folder databases, we recommend that you use public folder replication.  Unlike CCR, where multiple copies of a public folder database could not exist in the same cluster, you can use public folder replication to replicate public folder databases between servers in a DAG.

Several concepts used in Exchange Server 2007 continuous replication also remain in Exchange Server 2010 .These include the concepts of failover management, divergence, the use of the auto database mount dial, and the use of replication and client access networks.

The Role of the Cluster Service in Exchange Server 2010

Exchange Server 2010 includes a new Active Manager component that provides functionality that replaces the resource model and failover management features provided by integration with the Cluster service in older versions of Exchange.  Exchange no longer uses the cluster resource model for high availability.  All Exchange cluster resources provided by exres.dll no longer exist, including the construct known as a clustered mailbox server.  A failover cluster is used by Exchange, but there are no cluster groups for Exchange, and there are no storage resources in the cluster.  Thus, if you examine the cluster using cluster management tools, you will see only the core cluster resources (Internet Protocol address and network name, and if needed, file share witness resource).  Cluster nodes and networks will also exist, but those are managed by Exchange and not cluster or cluster tools.

Active Manager runs on all Mailbox servers that are members of a DAG.  There are two Active Manager Roles: primary active manager (PAM) and standby active manager (SAM).  PAM is the Active Manager in a DAG that decides which copies will be active and passive.  PAM is responsible for getting topology change notifications and reacting to server failures.  The DAG member that holds the PAM role is always the member that currently owns the cluster quorum resource (default cluster group).  If the server that owns the cluster quorum resource fails, the PAM role automatically moves to a surviving server that takes ownership of the cluster quorum resource.  In addition, if you need to take the server that hosts the cluster quorum resource offline for maintenance or an upgrade, you must first move the PAM to another server in the DAG.  The PAM controls all movement of the active designations between a database’s copies (only one copy can be active at any given time, and that copy may be mounted or dismounted).  The PAM also performs the functions of the SAM role on the local system (detecting local database and local information store failures).

The SAM provides information on which server hosts the active copy of a mailbox database to other components of Exchange (e.g. remote procedure call RPC, Client Access service or Hub Transport).  The SAM detects failures of local databases and the local information store.  It reacts to failures by asking the PAM to initiate a failover (if the database is replicated).  A SAM does not determine the target of failover, nor does it update a database’s location state in the PAM.  It will access the active database copy location state to answer queries for the active copy of the database that it receives.

Active Manager (AM)

Active Manager (AM) manages the live relationship between mailbox databases and the Mailbox servers that have replicated copies of the databases. 

The AM has the following functionality:

Mount and dismount databases, Provide database availability information, Provide interface for administrative tasks, Monitor for failure and Maintain database and server state information.

Active Manager Roles

At any given point of time, AM assumes one of the following roles.

  • Standalone – On a Mailbox server that is not part of a DAG, the role is always Standalone.  This role can only change if the server is added to a DAG.
  • Secondary Active Manager (SAM) – When a Mailbox server is a member of a DAG but does not currently host the default cluster group for the DAG, the assumed role is SAM.  The server can assume the PAM role only if it becomes the host of the default cluster group.
  • Primary Active Manager (PAM)  When a Mailbox server is a member of a DAG and currently hosts the default cluster group for the DAG, the assumed role is PAM.  The server can relinquish the PAM role and become a SAM if the default cluster group is moved to another server in the DAG.

The PAM role holder is responsible for making all decisions that affect database availability in a DAG.  Only one AM can operate as the PAM role holder at a time.  All other servers in the DAG operate as SAM role holders until conditions change.

Creating Database Availability Groups

A DAG can be created using the New Database Availability Group wizard in the Exchange Management Console, or by running the New-DatabaseAvailabilityGroup cmdlet in the Exchange Management Shell. 

When you create a DAG, an empty object representing the DAG with the name you specified and an object class of msExchMDBAvailabilityGroup is created in AD DS.

After a DAG has been created, you can add server to or remove servers from the DAG by using the Manage Database Availability Group wizard in the Exchange Management Console, or by using the Add-DatabaseAvailabilityGroupServer or the Remove-DatabaseAvailabilityGroupServer cmdlets in the Exchange Management Shell.

If the Mailbox server being added to a DAG is running Microsoft Windows Server 2008 and does not have the failover clustering component installed, then you must run the Add-DatabaseAvailabilityGroupServer cmdlet or use the Manage Database Availability Group wizard locally on the server being added.  This is because the failover clustering component is installed on the Mailbox server when it is added to a DAG, and there is no way to install failover clustering remotely.

When the first Mailbox server is added to a DAG, the following occurs:

  • The failover clustering component is installed, if it is not already installed.
  • A failover cluster is created using the name of the DAG.
  • A cluster network object (CNO) is created in the built-in computers organizational unit (OU).
  • An IP address is assigned to the DAG.  This is done by using the DatabaseAvailablityGroupIpAddresses parameter of the Add-DatabaseAvailabilityGroupServer cmdlet or by omitting this parameter and allowing the DAG to obtain an IP address by using a Dynamic Host Configuration Protocol (DHCP) server on your network.
  • The name and IP address of the DAG is registered as a Host (A) record in Domain Name System (DNS).

DAGs use a subset of failover cluster technologies, namely, the cluster heartbeat, cluster networks, and the cluster database (for storing data that changes or can change quickly, such as database state changes from active to passive or vice versa, or from mounted to dismounted and vice versa).

When creating a DAG, you will need to specify a name for the DAG no longer than 15 characters that is unique within the AD DS forest.  In addition, you will also need to provide a file share witness and witness directory for use by the DAG.  You do not need to create the directory ahead of time.  Exchange will automatically create and secure the directory for you on the file share witness you specify.  The directory should not be used for any purpose other than for the DAG witness.

The requirements for the file share witness are as follows:

  • The file share witness cannot be a member of the DAG.
  • The file share witness must be in the same forest as the DAG.
  • The file share witness must be running Windows Server 2003 or Windows Server 2008.
  • A single server can serve as a witness for multiple DAGs; however, each DAG requires its own directory.

We recommend that you use a Hub Transport server in the AD DS site containing the DAG.  This allows the file share witness and directory to remain under the control of an Exchange administrator.

When a DAG is formed, the failover cluster that is created will initially use the Node Majority quorum mode.  When the second Mailbox server is added to the DAG, the cluster quorum is automatically changes to the Node and File Share Majority quorum model.  When this change occurs, the DAG will begin using the specified Universal Naming Convention (UNC) path and directory for the cluster quorum.  If the witness directory does not exist, Exchange will automatically create it, and provision it with full control permissions for local administrators and the CNO computer account for the DAG.

Managing Database Availability Group Membership

When a server is added to a DAG, it works with the other servers in the DAG to provide automatic, database-level recovery from database, server, or network failures.  When a server is removed from a DAG, it is no longer automatically protected from failures. 

Before performing either procedure below, you must first verify that:

  • A DAG has been created. 
  • Because DAGs use failover clustering technology, all servers added to a DAG must be running Windows Server 2008 Enterprise or Windows Server 2008 Datacenter.
  • All servers being added to the DAG must have at least two network interface cards.  Each network interface card must be on a different subnet.
  • If the Windows Failover Clustering feature is not installed on the server being added to a DAG, then the following procedures cannot be performed remotely and must be performed locally on the Mailbox server that is being added to the DAG.
  • You must remove all replicated database copies from a server before you can remove it from a DAG.
  • When the first Mailbox server is added to the DAG, the DAG must be assigned an IP address.  The default behavior is to use DHCP to obtain an IP address for the DAG.  If DHCP is not available in your organization, or if you want to use a static IP address for the DAG, you can use the DatabaseAvailablityGroupIpAddresses parameter of the Add-DatabaseAvailabilityGroupServer cmdlet to specify an IP address for the DAG.  The IP address is needed only when adding the first Mailbox server to the DAG.

Using the Exchange Management Console to Manage DAG Membership

To perform this procedure, you must be assigned, either directly or using a universal security group, to the Organization Management Role Group.

1. In the console tree, expand Organization Configuration.

2. Select Mailbox, and then select the Database Availability Group tab.

clip_image001

3. Right-click the DAG you want to manage and then select Manage Database Availability Group Membership.

clip_image002

4. On the Manage Database Availability Group Membership page, you can either:

Click Add to add the local server to the DAG, select the local server, and then click OK.

clip_image003

Select a server from the list of members, and click the red X to remove the local server from the DAG.

clip_image004

5. Click Manage to perform the configured management action (adding or removing a server) on the DAG.

6. On the Completion page, the Summary states whether the operation was successful.  The summary also displays the Exchange Management Shell command that was used to perform this procedure.

clip_image005 clip_image006

7. Click Finish

Using the Exchange Management Shell to Manage DAG Membership

To perform this procedure, you must be assigned, either directly or using a universal security group, to the Organization Management Role Group.

In this example, a Mailbox server named CONSEAMB2 is added to a DAG named CONDAG1.  The DatabaseAvailablityGroupIpAddresses parameter is not used when the DAG already has been assigned IP addresses, or when you want the DAG to use DHCP to obtain an IP address.

Add-DatabaseAvailabilityGroupServer -Identity CONDAG1 -MailboxServer CONSEAMB2

In this example, a Mailbox server named CONSEAMB2 is added to a DAG named CONDAG1.  CONDAG1is configured with an IP address of 10.0.0.20.

Add-DatabaseAvailabilityGroupServer -Identity CONDAG1 -MailboxServer CONSEAMB2 -DatabaseAvailablityGroupIpAddresses 10.0.0.20

In this example, a Mailbox server named CONSEAMB2 is removed from a DAG named CONDAG1.  Before running this command, you must ensure that no replicated databases exist on the Mailbox server.

Remove-DatabaseAvailabilityGroupServer -Identity CONDAG1 -MailboxServer CONSEAMB2

Configuring Database Availability Group Properties

You can use the Exchange Management Console or the Exchange Management Shell to configure the properties of a DAG, including the file share witness and directory used by the DAG.

Configurable properties include:

  • File Share Witness – The name of the server that you want to host the file share for the file share witness.  Microsoft recommends that specify a Hub Transport server outside the DAG as the file share witness.  This enables the system to automatically configure, secure and use the share, as needed.
  • Witness Directory The name of a directory that will be used to store file share witness data.  This directory will automatically be created by the system on the specified file share witness.

The Exchange Management Shell enables you to configure DAG properties that are not available in the Exchange Management Console, such as encryption and compression settings, network discovery, the Transmission Control Protocol (TCP) port used for replication, alternate file share witness settings, and datacenter activation mode.

Quorum Model

Exchange 2010 uses only two of the four quorum models available in Windows 2008 Failover Clustering:

  • Node Majority Each node that is available and in communication can vote. 
  • Node and File Share Majority – Each node plus a designated file share (the “file share witness”) can vote, whenever they are available and in communication.

Using these two models, the DAG cluster provides automatic monitoring and failover only when there is a majority of votes, that is, more than half of the voters are functioning.

The quorum model used by Exchange depends on the number of mailbox servers in the DAG and is automatically updated as servers are added or removed from the DAG.  When the first mailbox server is joined to the DAG cluster, the Node Majority quorum model is used.  When a second server is added, the quorum model is changed to Node and File Share Majority.  If a third server is added, the quorum model is changed back to Node Majority.  This process continues as servers are added or removed from the DAG cluster so that the following rules emerge:

  • Odd Number of Servers – The DAG uses the Node Majority quorum model.
  • Even Number of Servers – The DAG uses the Node and File Share Majority quorum model.

Notice that each model results in an odd number of voters.  This ensures that in all cases that the cluster is able to maintain functionality as long as a majority of voters are functioning.  In the case where there is an even number of servers in the DAG, if it were not for the vote of the file share witness, a failure of half of the DAG members would result in a failure of the cluster.

Database Availability Group Network Encryption

DAG networks support the use of encryption by leveraging the encryption capabilities of the Windows Server operating system.  DAG networks use Kerberos authentication between Exchange servers.  Microsoft Kerberos SSP’s Encrypt Message/Decrypt Message APIs handle encryption of DAG network traffic.  Microsoft Kerberos security support provider (SSP) supports multiple encryption algorithms.  The Kerberos authentication handshake picks the strongest encryption protocol supported in the list: typically Advanced Encryption Standard (AES) 256-bit, potentially with a Secure Hash Algorithm (SHA) Hash Message Authentication Code (HMAC) to maintain integrity of the data.

Encryption Settings for DAG Network Communications

Setting

Description

Disabled

network encryption is not used

Enabled

network encryption is used on all DAG networks for replication and seeding

InterSubnetOnly

network encryption is used on DAG networks on the same subnet

SeedOnly

network encryption is used on all DAG networks for seeding only

You can configure DAG network encryption by using the Set-DatabaseAvailabilityGroup cmdlet in the Exchange Management Shell.  The possible encryption settings for DAG network communications are detailed in the following table.

Database Availability Group Network Compression

DAG networks also support built-in compression.  When compression is enabled, DAG network communication uses XPRESS, which is Microsoft’s implementation of the LZ77 algorithm. 

You can configure DAG network compression by using the Set-DatabaseAvailabilityGroup cmdlet in the Exchange Management Shell.  The possible compression settings for DAG network communications are detailed in the following table.

Compression Settings for DAG Network Communications

Setting

Description

Disabled

network compression is not used

Enabled

network compression is used on all DAG networks for replication and seeding

InterSubnetOnly

network compression is used on DAG networks on the same subnet

SeedOnly

network compression is used on all DAG networks for seeding only

The Exchange Management Shell enables you to configure DAG encryption and compression settings that are not available in the Exchange Management Console.

Page Patching

Microsoft Exchange Server 2010 high availability introduces a new Extensible Storage Engine mechanism known as page patching.  When database corruption is caused by minor disk faults, Exchange Server 2010 page patching automatically repairs the corrupted database, using one of the database copies configured for high availability. 

Exchange Server 2007 incremental reseed provided the ability to correct divergences in the transaction log stream between a source and target storage group, but did not provide a means to correct divergences in the passive copy of a database after divergent logs had been replayed.  This functionality forced the need for a complete reseed of the database copy. 

Exchange Server 2010 incremental reseed automatically corrects divergences in database copies using a new Extensible Storage Engine (ESE) mechanism known as Page Patching. 

Using Database Availability Group Networks

You can create multiple networks in a DAG, and dedicate them to client access or for replication purposes.

You can use the Exchange Management Console or the Exchange Management Shell to configure the properties of a DAG, including the file share witness share and directory used by the DAG, network encryption and compression settings, and the TCP port on each DAG network that is used for replication.

The Exchange Management Shell enables you to configure DAG properties that are not available in the Exchange Management Console, such as network discovery, the TCP port used for replication, alternate file share witness settings, and datacenter activation mode.

Creating a Database Availability Group Network

This section provides instructions for creating a DAG network using two different methods; using the Exchange Management Console and the Exchange Management Shell.

Using the Exchange Management Console to Create a Database Availability Group Network

To perform this procedure, you must be assigned, either directly or using a universal security group, to the Organization Management Role Group.

1. In the console tree, expand Organization Configuration.

2. Select Mailbox, and then select the Database Availability Group tab.

3. Right-click the DAG for which you want to create the new network, and then select New Database Availability Group Network.

clip_image007

4. On the New Database Availability Group Network page, provide configuration information for the new DAG network:

  • Network Name Provide a unique name for the DAG network of up to 128 characters.
  • Network Description Provide an optional description for the DAG network of up to 256 characters.
  • Database Availability Group Network Subnets Click Add to add each network subnet to the DAG network.  Subnets should be entered using a format of IP Address/Bitmask (for example, 192.168.1.0/24).  If you add a subnet that is currently associated with another DAG network, the subnet will be removed from the other DAG network and associated with the network being created.
  • Enable Replication - Leave the check box selected to enable the DAG network for use by replication.  When a DAG network is enabled for replication, MAPI traffic is restricted on that network.  Clear the check box to prevent replication from using the DAG network, and to enable MAPI traffic on that network.

clip_image008

5. Click New to create the DAG network. On the Completion page, the Summary states whether the operation was successful.  The summary also displays the Exchange Management Shell command that was used to perform this procedure.

6. Click Finish to exit the wizard.

Using the Exchange Management Shell to Create a Database Availability Group Network

To perform this procedure, you must be assigned, either directly or using a universal security group, to the Organization Management Role Group.

In the following example, a network named DAGNetwork01 is being created with a subnet of 10.0.0.0 and a bitmask of 8 in a DAG named CONDAG1.  Replication is enabled for the network, and an optional description of the network is also being added.

New-DatabaseAvailabilityGroupNetwork -DatabaseAvailabilityGroup CONDAG1 -Name DAGNetwork01 -Subnets 10.0.0.0/8 -ReplicationEnabled:$True

Configuring Database Availability Group Network Properties

Each DAG network has several properties that you can configure, including the name of the DAG network, a description field for the DAG network, a list of subnets that are used by the DAG network, and whether or not the DAG network is enabled for replication.

This section provides instructions for configuring DAG network properties using two different methods: by using the Exchange Management Console, and by using the Exchange Management Shell.

Using the Exchange Management Console to Configure DAG Network Properties

To perform this procedure, you must be assigned, either directly or using a universal security group, to the Organization Management Role Group.

1. In the console tree, navigate to Organization Configuration -> Mailbox.

2. In the result pane, on the Database Availability Group tab, select the DAG you want.

3. In the work pane, on the Networks tab, right-click the DAG network you want, and then click Properties.

clip_image009

4. Use the General tab to configure DAG network properties as follows:

  • DAG Network Name Each DAG network name must be unique and cannot contain more than 128 characters.
  • Network Description Use this box to provide an optional description of up to 256 characters for the DAG network.
  • DAG Network Subnets Each DAG network must contain at least one subnet.  Subnets should be added using a format of IP Address/Bitmask (for example, 192.168.1.0/24).
  • Enable Replication Leave this check box selected to enable the DAG network for use by replication.  When a DAG network is enabled for replication, MAPI traffic is restricted on that network.  Clear this check box to prevent replication from using the DAG network and to enable MAPI traffic on that network.

clip_image010

5. Click OK

Using the Exchange Management Shell to Configure DAG Network Properties

To perform this procedure, you must be assigned, either directly or using a universal security group, to the Organization Management Role Group.

In this example, a DAG network in a DAG named CONDAG1 is being renamed from its default network of DAGNetwork01 to a new name of RepNet.

Set-DatabaseAvailabilityGroupNetwork -Name RepNet -Identity CONDAG1\DAGNetwork01

In this example, a subnet of 10.0.0.0 and subnet mask of 255.0.0.0 is being added to a DAG network named RepNet in a DAG named CONDAG1.

Set-DatabaseAvailabilityGroupNetwork -Subnets 10.0.0.0/8 -Identity CONDAG1\RepNet

Database Availability Network CmdLets

The following new CmdLets are available for use in configuring DAG networks:

Set-DatabaseAvailabilityGroupNetwork

Use the Set-DatabaseAvailabilityGroupNetwork cmdlet to configure a network for a DAG.  You can configure a variety of network properties, such as:

  • Name for the network
  • Description for the network
  • List of one or more subnets that comprise the network
  • Whether the network can be used for replication activity (log shipping and seeding)

Get-DatabaseAvailabilityGroupNetwork

Use the Get-DatabaseAvailabilityGroupNetwork cmdlet to display configuration and state information for a DAG network.  State information is returned for subnets and for network interfaces

New-DatabaseAvailabilityGroupNetwork

Use the New-DatabaseAvailabilityGroupNetwork cmdlet to manually create a network for a DAG.  After you create DAG networks, you can use them for log shipping and seeding for mailbox databases hosted on servers in the DAG, or use them for client access to mailbox databases in the DAG.

Remove-DatabaseAvailabilityGroupNetwork

Use the Remove-DatabaseAvailabilityGroupNetwork cmdlet to remove a network from a DAG.

 

I’d like to express a HUGE thanks to Niyaz Mohamed for compiling content and providing screenshots, which hopefully will help loads of people on DAGs.