What's new in Exchange 2007 clustering

Most of you out there have of course heard about Exchange 2007 and maybe even have seen some demo's. But I have a feeling very few of you have actually deployed it yet - not that you don't want to of course but these things take time. In this blog I'm going to give you the low-down on what's new in Exchange 2007 clustering and how it might apply in your current Exchange 2000/2003 environment (I'm not going to even mention Exchange 5.5 as I know you have all migrated by now!). This way when your users (or your boss), wants "more" e.g. better availability, better performance, "why can't I do X, Y or Z" etc., you'll have a bit of ammunition ... "well if you gave me some budget I think I could do something for you ...". You know, be the hero.

4 Cluster Flavours & 4 New Acronyms

Yippee - 4 new acronyms to confuse our colleagues and bosses and make us look smart. They are:

  • SCC - Single Copy Cluster
  • LCR - Local Continuous Replication
  • CCR - Cluster Continuous Replication
  • SCR - Standby Continuous Replication

Let's have a closer look at them.

1. SCC - Single Copy Clusters - "Classic Clustering"

The original of the species is still with us but now known as Single Copy Clustering. This is where we provide redundancy for our servers but still have a single points of failure - the shared disks which host the cluster quorum and the Exchange databases and logs. In Exchange 2007, the fundamentals of how this works has not changed but Exchange 2007 introduces a number of optimisations designed to improve on what came before. The big changes are:

  • In previous versions, when you installed Exchange you pretty much got everything on the one box and you had to turn off what you didn't want. This applied equally to clusters and had a tendency to add complexity where not always needed. In Exchange 2007, using the concept of server roles, you can chose what role you want the server to perform - mailbox, hub, edge or unified messaging. In this world, SCC (and all of the other cluster types), only apply to the mailbox server role. In general this simplifies things which is always good. Failure resilience for the other roles is provided for in different ways e.g. NLB etc.
  • In previous versions of Exchange, if a single database failed in your storage group due to disk issues the whole server would have to fail-over and as many of you know, on a big server this causes major disruption. No longer. In Exchange 2007 the failed database is dismounted while the rest of the databases keep on going. You then can fix the broken database in isolation.
  • SCC in previous versions required serious competence with Cluster Administrator and all that that entails. In Exchange 2007, a lot of cluster related management is now part of the Exchange Management Shell. You still need to do some work with Cluster Administrator (or more likely get someone else to do it for you), but you can take back control by using the new cmdlets in the EMS. OK, you'll have to learn EMS but it's worth it.

2. Continuous Replication - LCR, CCR, SCR

In the world of e-mail the replication paradigm had delivered significant benefits - think of cached-mode Outlook. Replication is GOOD - we like it. Continuous Replication (CR), AKA log shipping, is a proven concept in the enterprise database world. So how does it work in Exchange? Here's how:

  • Since Exchange 2000, we have had storage groups and in each storage group are databases and logs. In Exchange 2007 the same applies - but with one key change required to support CR - only one database and associated set of logs per storage group allowed (you can have up to 50 storage groups though). This change facilitates log shipping which is at the heart of CR.
  • In CR, you have a "live copy" of your data - updated directly by the end-users. On disk (not including what is in memory), this is represented by the sum of database file (.edb) and non-committed transactions in the log files. This is the classic method for delivering a highly reliable database service.
  • When you set up CR initially (using EMS cmdlets), you take your live database i.e. your .edb file, and you copy it to another location (ideally on different physical disks either on the same server or on a different server altogether). Exchange then, as soon as they are closed on the live server, continuously copies the log files to this new location and replays them. This way you keep your live and standby databases pretty much in sync (otherwise known as asynchronous log shipping). Log file sizes are reduced from 5MB to 1MB in Exchange 2007 to facilitate this process - smaller, faster file transfer and replay - less potential for a gap between active and passive node. But your users I hear you say don't want to lose ANYTHING. How picky can you get? Anyway, in Exchange 2007 there a new way to handle this scenario - the "Transport Dumpster".
  • The Transport Dumpster is a component of the Hub server role. What happens here is that the hub server saves a copy of every mail sent to a user (on CCR or LCR clusters only), for a pre-defined period. In the event of a failover, the newly activated cluster mailbox server will request all hub servers to redeliver everything in the dumpster. If it already has the message then it discards the resent message but will re-deliver any lost messages. This does not cover absolutely everything but will allow you to get your e-mail service back up and running very quickly with only minor loss of data.

LCR, CCR and SCR are different variations on this theme with different pros and cons. With LCR the replication process takes place on the same server. In CCR you use a different flavour of Windows Clustering called Majority Node Set to have a 2-node cluster. SCR can be combined with SCC, LCR and CCR to give you a whole raft of interesting options. The devil, as usual, is in the detail but once mastered will put you in the driving seat when it comes to delivering higher service levels to your end-users.

   

There's obviously a bit more to it than that (and if you want to know more have a look here - High Availability). There's also lots of good stuff on the Exchange Team Blog - for example have a look at this Video series - Exchange 2007 Cluster Continuous Replication (CCR).

To wrap up I just want emphasize that no matter how good the technology some things never change. Technology alone will never deliver the goods. The delivery of a high quality e-mail service depends on the combination of people, process and technology. So expect to put some work and budget into training for your messaging engineering and operations teams, the creation of lots of new and updated documentation and plenty of testing - both in the lab and in the production environment.

Be the hero.