Exchange Server 2007 High Availability Storage Considerations



Introduction

Note: for more comprehensive coverage of Exchange 2007 storage please see this blog post:

http://msexchangeteam.com/archive/2007/01/15/432199.aspx

This is the second of a four-part blog on the features in Exchange Server 2007 that are designed to increase availability, and the hardware strategies you can use to increase fault tolerance, service availability and service continuity.  Over the next few months I’ll be addressing other Exchange 2007 storage best practices in two upcoming blogs:


  • Exchange 2007 Backup and Restore Mechanisms

    • This focuses on the features and strategies you can use to backup and restore your Exchange 2007 data.

  • Exchange 2007 Storage Planning, Configuration, and Validation

    • This will build upon the three prior blogs and tie everything together to outline our recommendations on how the storage solution should be configured, validated, and monitored.


In my first blog, Exchange 2007 Server Roles & Disk I/O, I focused on the new features in Exchange 2007 that impact storage.  Server roles were briefly talked about, and I introduced the new log shipping functionality used by continuous replication.  The focus of this blog are three high-availability features in Exchange 2007:


  • Local Continuous Replication (LCR)
  • Cluster Continuous Replication (CCR)
  • Single Copy Clusters (SCC)


Continuous Replication Overview


Continuous replication is a new Exchange 2007 feature where the storage group’s database and log files are copied to a secondary location.  The storage group being accessed by clients contains the active copy of the database, and the storage group in the secondary location contains the passive copy of the database.

As new transaction logs are closed, or filled up, they are copied to that secondary location, validated, and then replayed into the copy of the database.  The net effect is to provide you with a backup of the database that has already been restored to a mountable location before a disaster happens. This backup will be up to-date with all (or nearly all) transaction log replay already done. If the primary database is destroyed or unavailable, you can be up and running on the secondary copy within minutes.

To support continuous replication, transaction log file size is now 1 MB in Exchange 2007. In previously versions of Exchange, transaction log files were 5 MB.

Storage Terminology

Throughout this blog series, I’ll be discussing storage solutions quite a bit. To ensure a common frame of reference, I recommend that you familiarize yourself with the following terms, which are from http://en.wikipedia.org:


  • Logical Unit Number (LUN) In computer storage, a logical unit number or LUN is an address for an individual disk drive and by extension, the disk device itself. The term originated in the SCSI protocol as a way of differentiating individual disk drives within a common SCSI target device like a disk array. The term has become common in storage area networks (SAN) and other enterprise storage fields. Today, LUNs are normally not entire disk drives but rather virtual partitions (or volumes) of a RAID set.
  • Serial Attached SCSI (SAS) SAS is a serial communication protocol for computer storage devices. It is designed for the corporate and enterprise market as a replacement for SCSI, allowing for much higher speed data transfers than previously available, and is backwards-compatible with SATA. As the name suggests, SAS uses serial communication instead of the parallel method found in traditional SCSI devices, but still uses SCSI commands for interacting with SAS devices.
  • Internet SCSI (iSCSI) In the context of computer storage, iSCSI allows a machine to use an iSCSI initiator to connect to remote targets such as disks and tape drives on an IP network for block level I/O. iSCSI protocol uses TCP/IP for its data transfer. Unlike other network storage protocols, such as Fibre Channel (which is the foundation of most SANs), it requires only the simple and ubiquitous Ethernet interface (or any other TCP/IP-capable network) to operate. This enables low-cost centralization of storage without all of the usual expense and incompatibility normally associated with Fibre Channel storage area networks.

Continuous Replication Design Considerations


Isolated Storage

In order to achieve storage resiliency, it is recommended that the passive copy be placed on a storage array that is completely isolated from the active copy’s storage array.  Isolating the arrays from one another also provides the flexibility to use a variety of storage solutions. If the storage solutions used by the active copy and the passive copy are isolated from each other than your storage solutions don’t even need to be the same type or brand.  For example, the active copy could be housed on SAS storage, and the passive copy could be housed on iSCSI storage.  Regardless of the storage solution(s) you choose, we do recommend that you use storage controllers with battery-backed caching.

Performance

We recommend that you size the active and passive storage solutions equivalently. The purpose of the passive copy to provide a quick switch to the passive copy of the data in the event something catastrophic happens to the active copy. In the case of LCR, the activation is manual, and in the case of CCR, activation is automatic. The storage solution used by the passive copy should be sized both in terms of performance and capacity to handle the production load in the event of a failure. 

Legacy streaming backups can be performed against the active copy; but with VSS, you can backup either the active copy or the passive copy.  Backing up from the passive copy with a software snapshot is available on any storage type and removes the I/O overhead on the storage for the active copy, assuming you isolate the active and passive copies on separate storage. 

Storage Options

Without continuous replication, a storage failure requires you to restore from backup media. As a result, very fast, expensive, and resilient storage devices are typically used.  When using shared storage and VSS clones for fast recovery, the solutions often have two or three copies or clones of the data.  Continuous replication provides storage resiliency and using it only requires you to store a single copy of the data on disk.  With continuous replication, restoration from backup is no longer your first line of defense, and you can make VSS backups from the replica instead of from the live database. Thus, continuous replication not only makes recovery more resilient and faster, but can also reduce transactional I/O requirements, making other storage options, such as direct attached SAS, iSCSI, and single path fiber channel feasible. You may also be able to implement a much less expensive backup solution, because fast restoration from backup is now your second line of defense instead of your primary recovery strategy.

Regardless of your storage solution look you should refer to a vendor’s stated best practices for Exchange storage. In the case of solutions submitted to the ESRP program you can reference the Vendors submission at http://www.microsoft.com/technet/prodtechnol/exchange/2003/esrp.mspxge. By selecting a solution on the ESRP program you can ensure that the solution has been validated by the vendor and reviewed by Microsoft. Of course a customer should always test their implementations prior to implementation in production to ensure that the configuration is not impacted by environmental dependencies that the standard tests can not account for and finally you should make sure that the entire solution is properly monitored.


Continuous Replication Availability Considerations


LUN Design

When using continuous replication, each storage group can only contain a single database. As a result, each storage group will optimally use 4 LUNs. Each copy of the database will use one LUN and each set of transaction log files (the set for the active copy and the set for the passive copy) will also use one LUN.

When creating LUNs, it is a best practice to configure the storage as individual LUNs at the hardware level, and to not create multiple logical partitions of a LUN within the operating system. In addition, it is a best practice to separate the transaction logs and databases by storing them on separate physical disks. This increases fault tolerance as losing both your transaction logs and databases in the same storage group can result in extended down time, or the loss of important data.  We recommend that you separate the active and passive LUNs on entirely different storage arrays, using different controllers or HBAs, to eliminate the storage as a single point of failure.

LCR


When using LCR, your storage design should maximize fault tolerance by using separate storage controllers on different PCI buses.  Continuous replication is your first line of defense in the event of a catastrophic failure; however a single failed disk should not be classified as catastrophic. Every LUN should use a RAID level greater than RAID0 that is built into a storage controller that has a battery-backed cache.


Continuous Replication Performance Considerations


LUN Design

We recommend that you design the storage for your passive copy to match the storage for your active copy in terms of both capacity and performance.  The passive copy’s storage is the first line of defense in the event of a catastrophic failure of the active copy’s storage. Placing log and database LUNs on separate physical disks will keep the database workload consistent.  This also ensures that any actions performed against the passive copy’s storage, such as a backup; do not impact the active copy’s storage. 

RAID Selection


RAID10 provides the best performance and is strongly recommended for the LUNs containing the transaction logs.  Due to a significant change in database read/write ratios, RAID5 performance in Exchange 2007 is lower than previous versions due to the increase in writes as a percentage of total disk I/O.  Often overlooked performance considerations are the behaviors of different RAID levels under both failed disk and array rebuild scenarios.  RAID5 and RAID6 suffer from long rebuild times, and significant increased latency, and lowered transactional throughput during failure and rebuild. As a result, we recommend RAID10 for the transaction log LUNs.

Example LUN creation (please click on the thumbnail to view):


I/O Overhead

More transactional I/O occurs on the transaction log LUN on servers using continuous replication than on servers not using continuous replication. In previous version of Exchange log files are written, but never read, during normal operation. With continuous replication, each log is read for copy to the replica location This must be taken into consideration when sizing your server.  The active copy’s transaction log, which uses sequential writes, must also read the log after it has been closed and then copy it to the passive copy’s transaction log inspection folder.  The log must then be inspected at the passive copy’s location and then moved to its final destination on the passive copy’s LUN.  Finally the log is read and replayed into the passive copy of the database.  Both the active and passive transaction log LUNs must perform reads and writes, in contrast to the nearly 100% sequential write activity found on a mailbox server without continuous replication.  This change in behavior may require a re-evaluation of the cache settings on your storage controller.  Our recommended settings are 25% read and 75% write on a battery-backed storage controller. Both replica and primary log LUNs should be tuned for similar performance, because the replica may suddenly become the primary after a disaster.


LCR Storage Options


Local continuous replication (LCR) enables log replication on a single standalone server.  In the event of a catastrophic failure of the active copy of the database or logs, the administrator can quickly manually activate the passive copy.  The storage for the passive copy should be completely separate from the storage for the active copy, and to protect against potential driver instability, the storage can be of a different brand and model. To adhere to our best practices:


  • Controller cards should be on a different PCI buses
  • Active and passive storage LUNs should be on different arrays.

    • Example:  Primary on SAS and Replica on iSCSI storage.
    • Example:  Primary on SAS “array 1” and Replica on SAS “array 2”.

For a demo of LCR please go here.


CCR Storage Options

With cluster continuous replication (CCR) the second copy of data is stored on the passive node in the same cluster as the active node.  Since storage is not shared, you can choose servers listed in the Servers category of the Windows Server Catalog. Unlike single copy clusters, which require a solution that is listed in the Cluster category, CCR only requires servers in the Servers category.

CCR, which includes both automated failover and failback, provides higher availability than LCR.  By storing the passive copy on a completely different server, the operational impact to the active copy is decreased, and you have fault tolerance on the server.  VSS backups can also be taken from the passive node.

For a demo of CCR please go here.

Geographically-Dispersed Deployment

In a geographically-dispersed CCR deployment, the passive copy can be on a node that is in a different physical location than the active copy, thereby providing site resiliency. Guidelines in our replication document apply, yet the pull technology means high latency will not impact the user experience. This is in sharp contrast to the geographically dispersed cluster where synchronous replication latency does impact the live production LUN.  The replication process may run behind, increasing the amount of time the primary and the secondary copy are out of sync. If a disaster occurs on the primary, any mail that had not yet replicated, may be recovered from the Hub Transport servers if it is still available. Proper Hub Transport sizing and configuration is required to ensure that segment of time mail is stored exceeds the projected downtime of the active copy.


Single Copy Cluster (SCC)

Exchange 2003 servers that utilize Windows Clustering use shared storage such as fiber channel or iSCSI SANs.  In Exchange 2007 Windows Clustering with shared storage is designated Single Copy Cluster (SCC).  This is to help differentiate it from CCR, which uses Windows Clustering, but does not use shared storage for Exchange databases and log files.  The storage is local to each node in the cluster.  With SCC, all of the hardware, including the disks used for Exchange data, must be listed in the Cluster category of the Windows Server Catalog, and there are a few special considerations for backup. With CCR, the disks used for Exchange databases are local to each system and are not controlled or failed over as part of the cluster. This allows you much greater flexibility.

On a single copy cluster, some administrators use streaming backup to disk, and then fail over the backup LUNs to a passive node which offloads a secondary backup process to tape. VSS solutions require a backup server to mount the volume shadow copy to run checksum integrity.  SCC provides redundancy for the server, but not for storage. CCR allows you to simplify backup administration and offload backup IO demands completely to the passive replica server.


Geographically-Dispersed Single Copy Cluster

A clustered Exchange server using shared storage has the same fundamental storage considerations as a stand alone server.  In addition, a geographically-dispersed single copy cluster must be on the geographically-dispersed cluster list in the Windows Server Catalog to be fully supported. When using synchronous replication, disk latency on the production LUNs can be artificially increased by the replication process.  But CCR deployment has no latency impact on the production database LUNs.  More details on can be found in Deployment Guidelines for Exchange Server Multi-Site Data Replication.


Summary

Continuous replication provides service availability and service continuity for an Exchange 2007 mailbox server, without the cost and complexity of a shared storage cluster.  VSS and continuous replication are features that assist in enabling larger mailboxes and databases because they offer fast recovery in the event of a storage failure.  It is important to maximize the benefits of continuous replication by placing active copy of data and the passive copy of data on separate storage.  While the pool of possible storage solutions will grow with continuous replication, the importance of validating and monitoring your storage solution remain the same.

Robert Quimbey

Comments (33)
  1. wolf70 says:

    Very interesting article,

    but one thing about CCR logfile replication is not clear to me:

    You write that the active node will read the Log file again and copy it to the passive node. When I asked the Exchange team at the Teched in Boston I got told that one agent on the passive node does the work to transfer the logs (pull replication). Maybe you can explain the details here a little bit more.

    Thx

    Wolf

  2. Josh Maher says:

    Are there any mechanisms in the works to replicate the Hub Transport Dumpster data to the passive node’s site?

  3. lee says:

    Based on my speaking to the folks at Tech Ed earlier this year, the "leftover" messages in the Hub servers that hadn’t been committed yet, do get committed to the new server once the system determines the original primary is no longer available.

    The question I have is if you are setting up a geographically dispersed system and the Hub server acts as a witness, what is the best design for placement of this server.  IE, how can I best design a system that will only failover to the passive side if the primary side is actually gone.  As opposed to a fibre cut between the sites that results in both side coming up live.

  4. Elf says:

    I get the part about wanting each storage group to span 4 LUN’s in a CCR cluster, but I have a question about the number of storage groups to put on each LUN?  For example, if I have 50 storage groups, do I necessarily have 200 LUN’s to manage on this cluster or can several storage groups share a group of 4 LUN’s?

  5. Robert Quimbey says:

    Wolf70:

    You are correct, it is a pull.  Here is my attempt to describe it:

    The transaction log is closed and then pulled by the passive using a copy mechanism such as \servershare.  After they are ‘pulled’ to the target/passive LUN eseutil is run to ensure that it is not corrupt, for the right database, and to check if it is in sequence.  If it passes this test, the log file is moved from this inspector directory to the configured log directory.  Once inspected the logs are replayed into the target database assuming the storage group is not blocked.

    Josh Maher:

    I’ll investigate.  I am not aware of a feature in the product that will do this.

    Lee:

    In my next blog I talk a little bit about capacity, and performance on the Hub server.  Geographically dispersed solutions are being testing, and best practices are being worked on now.

    Elf:

    Very good question.  In the next blog I spend a bit of time on this and it should be completed very soon.  The short answer is yes, you _can_ place more than 1 storage group’s databases on a single LUN (and logs as well, but be sure to separate log/db LUNs at both logical and physical level).  I will try to make pro and con arguments for this strategy.  It is particularly appealing if you group a couple of storage groups into a ‘backup set’ and place them on the same LUN.  There are many cases where the restoration of a database with this strategy will impact other databases performance, and with a VSS clone restore for example, it will cause downtime during log replay if you restore the LUN which houses other databases.  Be patient, more details are coming soon.

    Thank you,

    Robert Quimbey

  6. dergo says:

    Hi.

    I have a problem to understand the calculation for the DumpsterQueue.

    It is recommanded to set the size to 1,25*MaxMailSize and the holdtime to 7days.

    The storage requirement for the queue should be MaxDumpsterSizePerStorageGroup*NumberOfStorageGroups.

    I think this is a little small?

    If the MaxDumpsterSizePerStorageGroup is the size of the hole queue, this means that i can hold only 1 mail per storagegroup?!

    I thought i will get more then one mail per week ;)

    Please can u tell me if this calculation is wrong – and when how i should set the queue right.

    Is there a way to list the mails in the queue and is the path to the queue configurable?

    Thank you,

    Christian

  7. bravo22 says:

    Hi, very informative blog, many thanks.  I’m looking to clarify the following points:

    1.  CCR – is not asynchronous, so how does that protect me from the "1018,1019,1022" type problems.

    2.  Can you implement either LCR or CCR in a single SCC setup with multi-node multi-EVS configuration?

    Thanks

  8. penny stock tip says:

    piskasosiska 538224 http://onlineinvestmentworld.com/penny-stocks/penny-stock-tip.html penny stock tip <a href="http://onlineinvestmentworld.com/penny-stocks/penny-stock-tip.html">penny stock tip</a>  [URL=http://onlineinvestmentworld.com/penny-stocks/penny-stock-tip.html]penny stock tip[/URL] onlineinvestmentworld.com/penny-stocks/penny-stock-tip.html [link=http://onlineinvestmentworld.com/penny-stocks/penny-stock-tip.html]penny stock tip[/link] * http://onlineinvestmentworld.com/penny-stocks/good-penny-stock.html good penny stock <a href="http://onlineinvestmentworld.com/penny-stocks/good-penny-stock.html">good penny stock</a>  [URL=http://onlineinvestmentworld.com/penny-stocks/good-penny-stock.html]good penny stock[/URL] onlineinvestmentworld.com/penny-stocks/good-penny-stock.html [link=http://onlineinvestmentworld.com/penny-stocks/good-penny-stock.html]good penny stock[/link] *

  9. private hyip says:

    piskasosiska 538224 http://onlineinvestmentworld.com/hyip/private-hyip.html private hyip <a href="http://onlineinvestmentworld.com/hyip/private-hyip.html">private hyip</a>  [URL=http://onlineinvestmentworld.com/hyip/private-hyip.html]private hyip[/URL] onlineinvestmentworld.com/hyip/private-hyip.html [link=http://onlineinvestmentworld.com/hyip/private-hyip.html]private hyip[/link] * http://onlineinvestmentworld.com/hyip/hyip-ranking.html hyip ranking <a href="http://onlineinvestmentworld.com/hyip/hyip-ranking.html">hyip ranking</a>  [URL=http://onlineinvestmentworld.com/hyip/hyip-ranking.html]hyip ranking[/URL] onlineinvestmentworld.com/hyip/hyip-ranking.html [link=http://onlineinvestmentworld.com/hyip/hyip-ranking.html]hyip ranking[/link] *

  10. e gold hyip invest says:

    piskasosiska 538224 http://onlineinvestmentworld.com/hyip/e-gold-hyip-invest.html e gold hyip invest <a href="http://onlineinvestmentworld.com/hyip/e-gold-hyip-invest.html">e gold hyip invest</a>  [URL=http://onlineinvestmentworld.com/hyip/e-gold-hyip-invest.html]e gold hyip invest[/URL] onlineinvestmentworld.com/hyip/e-gold-hyip-invest.html [link=http://onlineinvestmentworld.com/hyip/e-gold-hyip-invest.html]e gold hyip invest[/link] * http://onlineinvestmentworld.com/invest/invest-overseas.html invest overseas <a href="http://onlineinvestmentworld.com/invest/invest-overseas.html">invest overseas</a>  [URL=http://onlineinvestmentworld.com/invest/invest-overseas.html]invest overseas[/URL] onlineinvestmentworld.com/invest/invest-overseas.html [link=http://onlineinvestmentworld.com/invest/invest-overseas.html]invest overseas[/link] *

  11. Scott says:

    bravo22,  CCR is asynchronous, as only closed log files are shipped.  Also, while in the replication pipeline, each log file is first checksummed and validated in an Inspector directory before being copied to its final home and replayed into the copy of the database.

    Also, you cannot combine SCC, CCR, and LCR in any manner.

  12. busty miriam says:

    piskasosiska 538224 http://miriam.weeklysaleads.com/busty-miriam.html busty miriam <a href="http://miriam.weeklysaleads.com/busty-miriam.html">busty miriam</a>  [URL=http://miriam.weeklysaleads.com/busty-miriam.html]busty miriam[/URL] miriam.weeklysaleads.com/busty-miriam.html [link=http://miriam.weeklysaleads.com/busty-miriam.html]busty miriam[/link] * http://miriam.weeklysaleads.com/miriam-hopkins.html miriam hopkins <a href="http://miriam.weeklysaleads.com/miriam-hopkins.html">miriam hopkins</a>  [URL=http://miriam.weeklysaleads.com/miriam-hopkins.html]miriam hopkins[/URL] miriam.weeklysaleads.com/miriam-hopkins.html [link=http://miriam.weeklysaleads.com/miriam-hopkins.html]miriam hopkins[/link] *

  13. orpheum theater minneapolis mn says:

    piskasosiska 538224 http://minneapolis-mn.cellulite-removal.net/orpheum-theater-minneapolis-mn.html orpheum theater minneapolis mn <a href="http://minneapolis-mn.cellulite-removal.net/orpheum-theater-minneapolis-mn.html">orpheum theater minneapolis mn</a>  [URL=http://minneapolis-mn.cellulite-removal.net/orpheum-theater-minneapolis-mn.html]orpheum theater minneapolis mn[/URL] minneapolis-mn.cellulite-removal.net/orpheum-theater-minneapolis-mn.html [link=http://minneapolis-mn.cellulite-removal.net/orpheum-theater-minneapolis-mn.html]orpheum theater minneapolis mn[/link] * http://minneapolis-mn.cellulite-removal.net/craigslist-minneapolis-mn.html craigslist minneapolis mn <a href="http://minneapolis-mn.cellulite-removal.net/craigslist-minneapolis-mn.html">craigslist minneapolis mn</a>  [URL=http://minneapolis-mn.cellulite-removal.net/craigslist-minneapolis-mn.html]craigslist minneapolis mn[/URL] minneapolis-mn.cellulite-removal.net/craigslist-minneapolis-mn.html [link=http://minneapolis-mn.cellulite-removal.net/craigslist-minneapolis-mn.html]craigslist minneapolis mn[/link] *

  14. city of austin trash says:

    piskasosiska 538224 http://city-of-austin.autosportcatlaog.com/city-of-austin-trash.html city of austin trash <a href="http://city-of-austin.autosportcatlaog.com/city-of-austin-trash.html">city of austin trash</a>  [URL=http://city-of-austin.autosportcatlaog.com/city-of-austin-trash.html]city of austin trash[/URL] city-of-austin.autosportcatlaog.com/city-of-austin-trash.html [link=http://city-of-austin.autosportcatlaog.com/city-of-austin-trash.html]city of austin trash[/link] * http://city-of-austin.autosportcatlaog.com/austin-city-limit-video.html austin city limit video <a href="http://city-of-austin.autosportcatlaog.com/austin-city-limit-video.html">austin city limit video</a>  [URL=http://city-of-austin.autosportcatlaog.com/austin-city-limit-video.html]austin city limit video[/URL] city-of-austin.autosportcatlaog.com/austin-city-limit-video.html [link=http://city-of-austin.autosportcatlaog.com/austin-city-limit-video.html]austin city limit video[/link] *

  15. gazzoni says:

    Hello,

    But, there is only one thing that I do not like in Exchange

    Cluster (we use it since 2001): The extra works (and time)

    to get out it!. When all cluster functions fails (like a complete

    lost of all nodes), and you have only standalone servers to work,

    you must remove all Exchange attibutes of all users, and

    regenerate-it (if you have proxy-address backups), and,

    reconnect, by Exmerge, all your mailboxes. This scenário

    is not necessary if you do not use Cluster. Simply install

    new servers and restore entire databases. No users ‘objects’ involved.

  16. wolf70 says:

    Gazzoni,

    you are not correct. The way to move users to a stand allone server out of the SCC cluster is quite simple in Exchange 2003. The move back is more tricky but doable as well in short timeframe.

    CCR provided by E7 allow an easy move between servers. In fact you always got 2 working databases in this configuration and it is quite unlikly that you loose both in one time when you do your job correct.

    Cheers

    Wolf

  17. gazzoni says:

    Wolf,

    Thanks for your attention. If fact, this issue is about cluster in general,

    not necessary CCR related.

    But I know that, the ‘ressurection’ of a previous ExchangeServerName

    from scratch (when there is no conditions to recreate the entire cluster)

    is to ‘claim’, from AD,  a lot of information about the server. In a non-clustered installation, we simply use /dissasterrecovery setup on a new hardware. But, if this EVS is clustered, there are various steps to perform , because we do need "re-home" each user mailbox. In ActiveDirectory, there are significant diferrences beetwen clustered and non-clustered mailboxes attributes. The article

    http://support.microsoft.com/kb/323016/ starts to explain it.

    I really do not know another way, to recreate an EVS into a new standalone server, that does not force recreation of all ExchangeAttributes, of each user.

    Thanks.

    Gazzoni

  18. wolf70 says:

    Gazzoni,

    i didn’t tell that the /disasterrecovery switch would work. You need to change the attributes that tell exchange that it runs now on a stand allone machine, a rather undocumented task – but works. 4 Years from now i run into a similar situation where the SAN storage failed (FTDISK errors, etc). I moved all user over within 4 hours (the restore of the DB took most of the time) and i used a script to update the attributes.

    But this has been the only time in my hole worklive where i had to do this. Normally SAN failures are quite limited and that all servers of a cluster go down is unusual as well.

    Cluster requires more planning, administrative knowledge and maintanance, this is for sure. But it also provides a higher availability to insure that your SLA goals are met. Following these rules at least non of my exchange server installations failed in the last years .

    Brgds

    Wolf

    PS: I dont work for Microsoft nor do i get paid for my statements :-)

  19. gazzoni says:

    Hello,

    I´m sorry for this basic question, but: In a CCR configuration, why do we need SAN?? Since storage devices do not need to be "shared" , and log shipping is made trought network, can we use local disks for Exchange storage?

    Thanks

  20. wolf70 says:

    Gazzoni,

    you mix things up here. The SAN example i brought up here has been for SCC – the same thing you are doing now. For CCR you can use DAS (Direct attached storage). The configuration does not even have to be on the certified cluster hardware list in this case.

    Large Custumers might tend to stay on a SAN configuration even in with CCR. The reason herefor is that from a TCO point of view it does not make sense to move to DAS when you already have everything running everything else on SAN. (think about 40-200 servers here).

    In CCR which uses MSN (majority node set) you also should be able to restore the cluster config quite easy. You only need to take care of the withness share and one node – thats it.

    Wolf

  21. gazzoni says:

    Wolf,

    (still) About the first question (cluster recovery).

    In fact, Exchange 2003 cluster (complete) recovery is a bit less complicated that is in previous version (we still use Exchange 2000). There is a new (for me) concept, called ‘standby cluster’.

    The key is: do not re-home mailboxes (i.e,, do not convert previous clustered mailboxes in standard mailboxes), even so this scenário is not

    possibile for Exchange 2000.

    But, even for this "standy cluster" method, we must have ready a

    new computer with cluster requisites (SAN conectivity, etc).

    We can NOT USE a single (normal) computer to recover.

    And, at time of the disaster, we may not have it. This is the question.

    The recover is more difficult than recover site-only standalone servers.

    And the result, delayed.

    Well, it is same at all:   Cluster – Love it, or live it.

    Btw:

    The Exchange Team can think a way, in the feature, to eliminate

    differences between clustered/non-clustered mailboxes, and no

    differences between Servers and Virtual Servers. If Exchange need cluster informations in paritcular, put it in another place.

    Thanks for all news on this discussion !

    Gazzoni

  22. Andre says:

    Hello,

    I’ve got 2 identical servers and a direct attached Shared storage device (MD1000) that both servers can access. I can go SCC (like I did for 2003 active/passive) or go CCR. In either case the storage will be in the same device (so no real isolated storage). In this case is there any reason I should consider CCR over SCC ?

    You mention Raid 10 for log drives, does that same recommendation extend to DB drives ?

    Thanks, Andre.

  23. wolf70 says:

    Andre,

    I am not the specialist for Dell Storage but

    if your MD1000 got two EMM in Split configuration and the internal BUS is separted  you could go for CCR. You loose half of your storage in this config and you need two additional servers for the HUB/CAS role at least for redundency. Keep in mind that E2k7 does not allow any additional role to be installed on clustered mailbox servers. And you need a Windows server for the fileshare witness for CCR (can be one of the HUB servers).

    So if you should go for CCR depends if any failure on one EMM will not affect the complete MD1000 (-> ask DELL). E.g. if a BUS failure on one side affects the complete MA1000 you only waste storage and CCR will not make sense here.

    Another limit with CCR is that you can only have one MBX store / Storagegroup. If you assign mailbox limits on a per MBX store basis this could affect you as well.

    Your second question is hard to answer without any additional details. But i provide one hint: The DB Write/Read Ratio is (almost) 1:1 now in E7 (latest recommendation i heared of was 75/25 – might change again for different solutions). This has an bad impact on the performance of any SATA Raid5 system. AT the other hand the I/O need if reduced – if you provide your servers with sufficent memory. I am still not able to provide detailed numbers. The load simulation tools for E7 are still not final. For this reason i cannot name any real-life values for different configurations. Maybe one of the Microsoft guys can add some values here.

    Wolf

  24. Chris says:

    How do small companies handle fault-tolerance with Exchange?  We have about 45 users and having our email up all the time is critical.  Most of our orders/sales are generated by email and having down time directly effects our bottom line.  To me CCR is the only really way to have 100% up time, but the expense of the software and servers is a major concern.

    Is it possible to have three nodes with using CCR?  Two nodes would be on-site and then the third would be located at a different geo-location.

    Thanks,

    Chris

  25. wolf70 says:

    Hy Chris,

    with CCR you can only have 2 nodes. This is a limitation of the majority node cluster.

    If your mail environment is critical (even only for 45 users) you could use CCR for your first line of defence. That is out of the box and will not cost you anything more. Now we get to the more interesting part:

    A GEO cluster (streched cluster over subnets) is NOT supported by Microsoft right now. You will have to wait for Longhorn and an Exchange 2007 service pack to achieve this.

    If you still think about a third node you will have to use 3rd party tools. But all of them are based on SAN replication (+ Software, of cause) – and here it gets really expensive.

    But calculate wise here: How much money do i loose when my servers are offline for e.g. 4 hours? How much does the SAN solution cost? Is it better to have a cold-stand-by server and only ship the backup files to the remote location? Will the NW resolution work? – what is necessary here? etc…..

    Hope i could provide you with some ideas.

    Wolf

  26. Chris says:

    Thanks Wolf!

    The Enterprise version is required in order to use CCR….correct?  

    Our corporate office is located in FL so having a true operational backup plan is required with the potential hurricanes.  Just trying to figure out the best plan of action with minimal down time.

  27. wolf70 says:

    Hy,

    yes, Enterprise Edt. is necessary for CCR.

    Wolf

  28. Chris says:

    Wolf, do you think we will see some solutions in the near future that will easily allow for smaller companies to spread servers and storage to remote locations for geo-fault-tolerance?  Will iSCSI be something that will allow us to do this easier and cheaper?

    Thanks,

    Chris

  29. wolf70 says:

    Chris,

    Microsoft is working on that right now to make CCR WAN capable. But as mentioned before you need to wait for one year.

    Wolf70

  30. Chris says:

    Can servers that are in a cluster and using CCR have more than the mailbox role installed?  Is this possible and/or would it be considered a bad idea?

    Thanks,

    Chris

  31. Wolf70 says:

    Chris,

    no you cannot install any other role on a clustered server (you even don’t get the option in the setup program).

    Merry Christmas!

    Wolf70

  32. Anonymous says:

    The ability to be able to continue providing a full service to your user community in the event of the

  33. Anonymous says:

    I’ve had a couple of conversations with customers lately, looking for advice in figuring how to plan

Comments are closed.

Skip to main content