Exchange Disk Sizing (it's not just for cluster anymore!)

Article
10/14/2004

Nicole Allen has a great post over at the Exchange Team blog with step-by-step info on disk sizing for Exchange servers. This is the same sort of information that you can find in the Optimizing Storage for Exchange 2003 whitepaper, but Nicole does a great job of making it make sense!

In support, it's become very common to see servers sized with "400gb of storage" (designing primarily for required capacity), for instance, rather than with "6 spindles" (designing primarily for required performance). It seems like this is becoming even more common as Exchange design engineers work with their company's storage people to get space carved on the SAN. Oftentimes the Exchange people and the storage people aren't both aware of the needs of Exchange storage, and so design mistakes are made early in the process.

Here are some brief best practices I've seen listed for Exchange Storage design:
(credit to Dave Lalor for the high-level list)

1) First, size the number of disk spindles to accommodate the total number of IOPS required by the system. Size for peak load, understand your user profile, realize that disk has two important measures (capacity and performance).

2) Then, select the capacity of disk spindles to satisfy required data sizes. Total DB size = Number of users * mailbox size. Consider database overhead for maintenance activities (leave space for utilities to run). Also leave space for overhead like DB indexes, deleted items retention, etc. Don’t forget to include SMTP queues -- their random read/write IO profile is a better fit with the database drives than the transaction log drives if you choose to double them up. Also, remember that more items in the mailboxes will require more IOPS.

3) Separate data and log volumes for each Exchange storage group. Storage group has max 5 databases + one log. Recommendation is to put all 5 DB on one LUN and the transaction logs on a second LUN. Placing the databases on independent LUNs (rather than one larger LUN) can lead to difficulty meeting the 10 second quiescence window for VSS.

4) Tune storage array parameters. Some suggestions: 4kb cache page size (only if Exchange is the only thing on the array, otherwise leave it at 8kb). Maximize the write cache -- this is HUGE for Exchange performance; we're very write cache effective. Minimal (50-100mb) read cache. Enable cache watermarks. Enable read&write cached for all luns. Stripe element size of 64 blocks (32kb).

5) Align disk partitions to stripe size. If you've got a stripe element size of 32kb (see #4, just above), you'll want to make sure you've aligned your partitions to a 32kb boundary to prevent inefficient access to some of the blocks of data. Most of the SAN vendors have much better docs on this than I can provide here. :)

6) Cluster for high availability. Always design for Active/Passive rather than Active/Active. Use mountpoints for log and data per Storage Group or EVS to reduce number of drive letters required.

7) Validate the design. This may just be the most important item of the sevel... Peer reviews, copy known configuration, built it and test it with JetStress (monitor server performance with perfmon counters, 3rd party analyzer tools, etc). Resolve any issues early in rollout.

So that's fun, but I'm not done yet. Another thing I've talked with a handful of customers about lately is the disk performance characteristics of various parts of Exchange.

As Nicole mentioned in her post, if you consider only logs and database IO, roughly 90% of the IO on the system goes to the databases and only 10% goes to the logs. So right off the top, the database drive performance is a huge consideration in terms of total system performance.

The transaction logs are 100% write -- and sequential write at that. Doing nothing but sequential writes to the disk is a comparitively low impact activity. But be mindful of another thing Nicole pointed out: write penalties for various RAID levels. If you have a single disk, that single write IO will take a single IO to the disk. If you have RAID 1+0, it will take two write IOs. And if you put your logs on RAID5, every single write operation will take 4 write IOs at the disk. But, all that said, I've very rarely seen transaction log drives that are dedicated to that task become IO bound and encounter high latency.

Database drives are typically going to be somewhere between 66%-75% read IO with the remainder being write. And, of course, this IO is randomly distributed across the disk. Each of the read IOs takes a single IO, but each write (25%-33% of the total IO to the disk) takes 1, 2 (RAID 1+0), or 4 (RAID5) IOs to complete. Nicole ran through a lot of this calculation, so I won't cover it further here.

There are a couple of other things that need to be placed on disk somewhere, and the best thing to consider for each is what type of IO profile it most closely matches with. You might be surprised by what you find:

SMTP Queues, MTAData, Full-Text Indexes, System TEMP/TMP directories - These tend to be highly randomized reads and writes, so you will find they are most closely aligned with the database drives in terms of IO profile. In fact, placing these on the same disk as the transaction logs -- placing ANYTHING not write-sequential on those disks, actually -- may cause IO latency to increase dramatically on the log disks since they will no longer perform as 100% write-sequential.

Exchange Disk Sizing (it's not just for cluster anymore!)

Additional resources