Exchange 2010 and High Availability – Part I - DAG

Great Post Mark. The changes for Exchange 2010 keep coming. It’s great to see the maturity of product!!!

Today I’d like to update everyone on HA options in Exchange 2010. In Exchange 2007 we introduced CCR, LCR, SCC, and SCR (in SP1). So what is new and improved?

Improved Mailbox Uptime

CCR & SCR have been evolved into a unified solution and failover is now at the database level rather than the server level and is significantly faster will improve SLA’s. We increased the number of replicated copies from which can be configured. We now support 16 replicas.

Spending less time managing and deploying the solution, also improves uptime for users -- Exchange managing the failover process, allowing you to deploy the solution in an incremental fashion and making it as easy to stretch the solution across datacenter in different sites. These changes will help reduce the operational costs of deploying and managing the solution.

Storage Flexibility

By improving the performance of Exchange, we are able to provide more storage options which provides more flexibility to users (RAID-less, JBOD). These IO performance reductions mean that users are able to take advantage of larger low-cost disks and when combining that with the high availability features can consider some new deployment scenarios around RAID-less disk configurations. The net result is a reduction in storage costs while being able to provide users with larger mailboxes.

End to End Availability

Beyond the mailbox databases themselves the bigger issue of end-to-end availability has been enhanced by reducing the number of messages which can be lost while being sent between transport servers and enabling users to stay online when their mailbox is being moved.

image

Here is Don McLovin’s  new Exchange 2010 environment at Contoso University. Tomas is the lead Exchange and Active Directory administrator for Contoso U. He has overall responsibility for providing messaging and communications services to all of Contoso’s employees. Don’s primary challenge is to maintain high levels of availability with a flat or shrinking budget year-over-year.

There are 5 servers in the main datacenter in Paris that host mailboxes. These mailbox servers are grouped to provide automatic failover. The group of servers is known as a Database Availability Group. Each mailbox database has 3 instances, which we’ll refer to as copies, placed on separate servers to provide redundancy. At any given time, only 1 of the 3 database copies is active and accessible to clients. This gives us database centric failover and all the failover is managed within Exchange.

The Client Access Server manages all communications between clients and databases. Outlook clients no longer connect directly to mailbox servers, as they did in previous versions of Exchange.

When a client such as Outlook connects to Exchange, it first contacts the CAS Server.

The CAS Server determines  where the user’s active database is located ( in our case the user is on DB1 which is currently active on Mailbox Server 1), and forwards the request  to the appropriate server.

When the client sends an e-mail , the active database is updated. Then, through log shipping , the other 2 passive copies of the database are updated.

Let’s say that a disk fails , affecting one of the databases on Mailbox Server 1. In previous versions of Exchange, the administrator would need to failover all the databases on Mailbox Server 1 to recover from this failure, or else restore the Database 1 from a tape backup. However, Exchange’s new architecture supports database-level failover, so Database 1 has automatically fails over to Mailbox Server 2  without affecting the other databases.

The Outlook client, having lost its connection to the database, automatically contacts the CAS Server to reconnect.

The CAS Server determines which mailbox server has the active copy of the users’ database. It connects the client to Mailbox Server 2.

When new mail is sent , the active database on Mailbox Server 2 is updated. The second copy of the database is also updated through log shipping. The end user is unaware that anything has happened, and McLovin can replace the failed disk drive at his leisure.

The administrator can set up to 16 copies per database to meet the Service Level Agreements for his users. For a special category of users, Tomas keeps a 4th database copy on a mail server in a geographically remote location. This server is located in a different Active Directory site, but is kept up-to-date over the Wide Area Network using the same replication technology as the other servers. (No stretching of subnets) If a hurricane, earthquake, or other catastrophe should shut down the main datacenter, this remote server can be activated and readied for client access in about 15 minutes.

 

Fundamentals

Database Availability Group – often referred to as a ‘DAG’ - Set of up to 16 Mailbox servers that communicate to manage failures that affect individual databases. Any server in a DAG can host a copy of a mailbox database from any other server in the DAG.

Mailbox Servers - When a server is added to a database availability group (DAG), it works with the other servers in the DAG to provide automatic, database-level recovery from database, server, or network failures.

Mailbox Databases - Databases are ‘disconnected’ from servers and Exchange 2010 adds support for up to 16 copies of a single database. Only Mailbox databases, not Public Folder databases, can be replicated.

Database Copies - Storage groups removed, so log shipping replication now operates at the database level. transaction logs are replicated to one or more other Mailbox servers, and replayed into a copy of a mailbox database that is stored on those servers. Note that you can't replicate outside the DAG (key difference from SCR)

Active Manager - DAGs use a new component in Exchange 2010 called Active Manager, which is a process which runs on each Mailbox Server. Active Manager manages which database copies should be active and passive

Now that we’ve introduced the concept of a DAG to you Part II will dive into more details on DAG operation.