What to consider when deploying Exchange clusters...

After some time of being inactive... here I go again..... :)  from now on I'll write some of my posts in English due the fact that I have some Indian friends who wants to understand my blog's content...

Well... lets get into the point... During the last month i've been working in the definition of an Exchange 2003 4-node cluster (A/A/A/P), the solution should provide high performace levels, stability and availability; in order to accomplish this goal different infrastructure aspects had to be considered. Following you'll find a list of aspects and recommendations that you have to take care of when you deploy a cluster solution for Exchange... just to mention... this list was provided from a MS-colleague (Venkatesh R), needless to say that it was very useful and quite complete... enjoy it hehehehe :)

Disk Subsystem

  • Basis of disk subsystem has to be on the User IOPS to determine the number of spindles in the RAID Set.
  • The SAN Vendor recommendation of RAID 5 Vs Microsoft recommendation of RAID 10, the basis of usage has to be only on IOPS calculation.
  • The Fluff Factor has to be taken into account nothing less than 40% of an Individual Mailbox Size.
  • Net Effective Space available for data on a 143 GB disk is not greater than 130 GB only.
  • The Optimal Configuration of a RAID Group Varies between vendors ( HP EVA – In Multiples of 8 Disks and Helical Striping Vs EMC CX Series – In Series of 9 Disks for the same RAID 5 Configuration). Moreover the disk IOPS provided by the SAN Vendor is under standard test conditions which is not relevant to an Exchange database which has Dynamic/Random Read Writes.
  • The idea of using a Volume Mount Point on the AAA-P cluster (especially for drive letter mapping) helps us in reducing the drive letters. Moreover it is easier to define a LUN ID and present it as a basic disk, in order to ensure that Exchange Store and the Transaction Logs are in placed in different spindles / RAID Groups
  • SMTP Resource needs to be on RAID 0 and not RAID 10 as when optimally designed the space consumed by SMTP for an EVS Instance shall not exceed 40% above which there shall be a queue buildup
  • Stress Testing using LOADSIM and MMB should be done prior to the system being introduced in production (Lab) and can be leveraged for a performance base line only.

GC Placement

  • The Ratio of Global Catalogs to the Exchange Server Nodes is fixed as 1:4, but for a 4 Node cluster with 16 processors, it is necessary to ensure that there are at least 4 Global Catalog servers in the same IP site. From Experience it is necessary to ensure that user authentication is not redirected to this GC. Also referral to the PDC needs to be curtailed if the PDC Emulator is also present in this site

 

Failover and Fail Back

  • The Failover and the Fail back times cannot be determined by a thumb rule, The following parameters need to be clearly understood and analyzed
    • No. of User Mailboxes / RPC sessions
    • No. of SMTP Sessions
    • I/O of the disk subsystem specific to the Transaction logs Disk set on peak load
    • Size of the Exchange Database
    • Failover – Manual or Automatic. (Better to set the Failover to Manual, a crucial decision as it involves 24*7 support)

LUN Design and Meta LUNs

  • Though the LUN design is to be provided by the SAN Vendor, it is preferred to have a LUN design in order to verify the correctness of the disk subsystem configuration.
  • For an EMC, we have utilized the concept of Meta LUN’s which ensures the dynamic expansion of disk sets, lest we need a down time that includes SAN Boot up, LUN’s being presented by the SAN to the Exchange Nodes, and initialization which can extend upto 4 Hours.

Cache

  • SAN write back Cache is another critical area of design. The Cache quoted by the SAN Vendors is inclusive of the SAN subsystem /OS loading, Memory resident cache and the write back to the Array Controller. For Example EMC Cx700 claims a Cache of 8 GB, but actual physical pipe available for Write back is only 2.6 GB for Each Controller (Spa / SPb). This however depends upon the data throughput across the Fiber Switch.

SLA Definition

  • The Critical design factor of the SAN based Exchange System is the time taken for backup restore. It is always recommended to be staged. (Disk to Disk and then Disk to Tape). For the Disk to Disk, it is necessary to size the same to an Array of ATA Disks (320 Gb * 10 is ideal). The Staged backup to the media (removable) can be scheduled during working hours. For all purpose of calculation we decided to limit the Store Size to 25 GB ( User Mailbox is 100 Mb, 40 % Fluff factor, Scaling to max 4000 Users/EVS with above mentioned limit on an AAA-P Cluster).
  • Ideal Design should contain a NSPF on all components. (SAN on Total Failure should be the complete downtime.)
  • Hot Spares, Need to mention the configuration of Hot Spare Disks for each RAID group.

Client Dependency

  • Cached mode of OL2003 is recommended due to the fact; during a failover reconnect to the MAPI Session is automatic without user intervention.
  • Leverage RPC/https inside corporate network if possible, in order to leverage Rate limiting / Scenarios where communication needs to be restricted to standard ports and not RPC Traffic.

 

Processors

  • Enable Hyper Threaded processors.
  • For greater than 3 GB, use the /3GB, Userva=3030 Switch ( / PAE if they are dynamic hot swap modules)
  • Page File to be Placed away from Exchange Binaries

 

As you can see.. is not that easy to design a cluster (some of my friends use to say that it's only: next -> next -> yes -> finish!!! hehehehe and repeat it as much nodes as you have)... if we want to optimize our cluster implementation, all this aspects should be considered (since the begining/design not when is already deployed) and a specialist should do it  to get the real value of this solutions.