CCR or Stretched CCR?

Having spoke with a few customers about whether a local CCR and SCR is the best solution or a stretched CCR across 2 data centres I thought I'd write a post.

There is no right and wrong answer to that question, in typical consulting style 'it depends'. There are various factors to take into consideration when designing the right solution for your customer:-

  • Network Infrastructure such as data centre locations, network bandwidth, latency, redundant links (including switches)
  • Customer requirements (do they require full site resilience without manual intervention)
  • Cost
  • Does the customer currently have the right skills to manage the environment
  • How many copies or the database are required, (2 with a stretched cluster, 3 with CCR and SCR)
  • Can a 3rd data centre be used to host the File Share Witness (FSW)

There are also some factors to think about from the client side, such as DNS refresh. If the customer doesn't have a stretched Virtual LAN (VLAN) between data centres, the cluster will be assigned 1 Network Name resource and 2 IP address resources (since both nodes are separate IP subnets).  When the the clustered mailbox server (CMS)fails over the CMS will be assigned a different IP. As part of the cluster configuration in Windows 2008 we recommend the default DNS TTL value for the CMS Network Name resource should be changed.

By default the cluster service has a setting of 20mins, you need to be careful if you change the DNS TTL value through the DNS management console as this will be over written by the cluster settings. So if you want to change the default value from 20mins to our recommended setting of 5 mins you'll need to make the change through cluster administrator.

In order to make this change you'll need Local Admin on each node in the cluster and have full control permission to the cluster.

From a cmd prompt run - cluster.exe res <CMSNetworkNameResource> /priv HostRecordTTL=300 (where 300 is the recommended 5 mins as mentioned above)

Take the cluster offline by running Stop-ClusteredMailboxServer cmdlet in Power Shell

Bring the cluster back online by running Start-ClusteredMailboxServer cmdlet.

 

I’ve listed below a few risks and how they can be mitigated if you do decide to go with a stretched CCR over CCR + SCR

Risk

Mitigation

File Share Witness (FSW) Location

Locate the FSW at an alternate location to provide additional resilience to the cluster

Client cache IP refresh interval

this can configured on the cluster in Windows 2008, or a stretched VLAN can used

Logical corruption of the databases

SCR would provide this feature, but take into consideration your Recovery Time Objective (RTO)

Is the network link between physical locations resilient

Ensure there is alternate routes available

Does the network link between physical locations have low latency (below 50ms)

Test network latency

Network link between between physical locations has enough bandwidth

Test network bandwidth

Backup solution can backup any node in any physical location

ensure your chosen backup solution can back up both locations in the event of a site failure

Manual configuration required to control message routing within a data centre (SubmissionServerOverridelist)

Ensure your operational guides are up to date with how to configure mail routing

Control Client Access within a Datacentre

ISA Server

Querying of AD may take place across the data centre interconnect

None

Potential loss of email data in the event of a site failure

Email will be stored in the transport dumpster of the HT server in the failed site

Operational Management

Having an in-depth understanding of cluster technology and Window 2008 and Exchange 2007 experience

Written by Daniel Kenyon-Smith