Capacity Planning – Yes Transaction Log Space is Critical to Keeping your Databases Healthy and Mounted


The other day I was chatting with one of our Supportability Program Managers, Nino Bilic, and he mentioned something that was rather alarming  – the number one reason why our Premier customers open Exchange 2010 critical situations is because Mailbox databases dismount due to running out of disk space on the transaction log LUN. 

I’ll let that sink in for a moment.  Naturally I’m shocked…to be completely honest, I thought with the Mailbox Requirements Calculator and our guidance on TechNet, we’d have wiped out this issue by now.  After sharing this information with me, Nino decided that I, not he, should write a blog article on the topic of transaction log capacity planning (gee, thanks Nino!).

Capacity Planning 101

In order to properly size a transaction log LUN, we need to understand a few things about the environment:

  1. How many mailboxes will reside in the database?
  2. What is the message profile of the mailboxes in the database?
  3. What is the average message size?
  4. What is the average mailbox size?
  5. How many mailboxes are moved per day?
  6. What is the backup and restore solution?
  7. Does the solution need to take into account any other failure scenarios, like network failures?

For the purposes of this discussion, let’s assume that each database will house 250 mailboxes.  Each mailbox sends/receives a 150 messages per day, with an average message size of 100KB.  Based on the table in Understanding Mailbox Database and Log Capacity Factors, we know that a 150 message profile with a 75KB average message size generates 30 transaction logs per day (24 hour period).  Since our message size is greater than 75KB, we need to account for that in our transaction logs per mailbox generation.  The guidance stipulates:

If the average message size doubles to 150 KB, the logs generated per mailbox increases by a factor of 1.9. This number represents the percentage of the database that contains the attachments and message tables (message bodies and attachments).

Therefore, we can determine the impact our 100KB average message size has with this formula:

150 / 1.9 = [average message size of profile] / x

x = (100 * 1.9) / 150

x = 1.266666666666667 ~ 1.27

So by having a message size that is 25KB larger than the baseline, the number of transaction logs generated per day per mailbox increases by a factor of 1.27.  Therefore, 30 transaction logs * 1.27 = 39 transaction logs / day / mailbox.  This means, that for a database of 250 mailboxes, each database will generate 39 * 250 = 9,750 mailbox generated transaction logs / day / database.

Mailbox moves also generate transaction logs.  Each mailbox moved to the destination database generates roughly enough logs (at the destination, not the source) that equal the size of the mailbox (including the contents in the Recoverable Items folders).  For example, moving 1% of the mailboxes per day will mean that 2.5 mailboxes are moved into a database each day.  If each mailbox is 5.4GB in size on average (including 14 day deleted item retention with Single Item Recovery enabled), then 2.5 * 5.4GB / 1024 = 13,888 mailbox move transaction logs / day / database.

From a backup/restore perspective, we need to take into account the type of backup architecture we are leveraging.  With each backup scenario, there is a recommended number of additional days you should provision from a capacity perspective for your mailbox generated transaction logs.  By provisioning extra space, you can survive multiple failures without suffering an outage event.  For more information on transaction log truncation, see Understanding Backup, Restore and Disaster Recovery.

  Transaction Log Truncation Recommended Backup Failure Protection
Daily Full Backup Daily 3 days
Weekly Full Backup / Daily Incremental Daily 3 days
Weekly Full Backup / Daily Differential Weekly 7 days
Bi-Monthly Full Backup / Daily Incremental Daily 3 days
Exchange Native Data Protection As logs are no longer required 3 days

Of course, there are other scenarios that you may need to consider.  For example, if you are deploying a stretched Database Availability Group (DAG) across two datacenters, log truncation will only occur if the network link between the two datacenters is operational and the database copies are healthy.  If you know that an outage of the WAN link could take 5 days to repair, you should adjust your backup failure protection to take that into account.

For our scenario, let’s assume we only need to ensure we can survive 3 days of truncation failure events. This means that we need 9,750 / 1024 * 3 = 28.5GB of disk space for our mailbox generated transaction logs.

In addition, we need to account for the amount of disk space required for our mailbox move events for the entire week: 13,888 / 1014 * 7 days = 94.9GB of disk space for our mailbox move operations.

All told, this means that each database needs 123GB of disk space for transaction logs.  We should also include a data overhead factor as well, to account for any unexplained phenomenon that may occur: 123GB * 1.2 = 148GB of disk space for transaction logs.

If we are deploying a dedicated LUN for the transaction logs, we would not provision a LUN of 150GB as that would mean that we could consume all of the disk space if we were having backup failures and excessive mailbox moves.  Typically you want to ensure that each LUN is provisioned such that only 80% of the disk capacity is utilized.  The formula is:

LUN Space = [projected disk space utilization] / (1 – [desired free space percentage])

LUN Space = 148GB / (1 – .2) = 148GB / .8 = 185GB LUN Space for Dedicated Transaction Log Volume

If you are deploying the transaction logs on the same LUN as the database, you would simply combine the transaction log disk space requirements with the database disk space requirements for the [projected disk space utilization] value.

How can I prevent consuming all of my transaction log disk space?

First and foremost you need to obtain a baseline of your environment to determine you typical log generation rate per day.  In addition, you must setup monitoring and take action on any alerts that are generated.  Monitoring should monitor for the following scenarios:

  1. Transaction Log LUN disk space.  Setup up several thresholds and different alerting mechanisms.  Your first alert should not be the one that indicates 90% of your disk has been consumed.  If you know your typical log generation baseline, you can setup a threshold to report if you are 20% over, for example.
  2. Monitor for successful completion of your backups (if you aren’t leveraging Exchange Native Data Protection).  Your first indication of backup failures should not be when you run out of disk space.
  3. Monitor for the truncation events in the Application Log.
  4. Monitor your database copy replication health. 

What if I’m having unexplained growth in my Transaction Logs?

My friend, Mike Lagase, wrote a great article on how to troubleshoot this scenario – http://blogs.technet.com/b/mikelag/archive/2009/07/12/troubleshooting-store-log-database-growth-issues.aspx (please note that the article was written with Exchange 2007 in mind, so several of the tools and/or recommendations may no longer apply with Exchange 2010).  In addition to the steps Mike mentions, you can utilize the following in Exchange 2010 to help determine the unexplained transaction log growth (thanks to Todd Luttinen for putting this list together):

  1. You can use the store usage statistics cmdlet  (get-StoreUsageStatistics with DigestCategory = ‘LogBytes’) to identify mailboxes generating high log byte count.  Note that this doesn’t always work for cases where log bytes aren’t generated by the mailbox owner or the operation is performed on behalf of client (like CopyOnWrite) and doesn’t include log bytes generated by system services (reported in Event ID 9826).  These stats provide a summary of last 10 min of activity for top mailboxes generating log activity (up to 6 samples covering last hour). The following shows how to use store usage stats to find top mailbox generating log bytes over last hour:

    [PS] C:\>$stats = Get-StoreUsageStatistics –Database <Database Name>
    [PS] C:\>$stats | ? {$_.DigestCategory -eq ‘LogBytes’} | group MailboxGuid |sort count -Descending | Select -first 1 -ExpandProperty Group | sort SampleTime | ft -a MailboxGuid,Sample*,Log*

    MailboxGuid SampleID SampleTime LogRecordCount LogRecordBytes
    c007c87a-e030-4414-b741-9cf61e88b9de 5 11/7/2011 4:25:05 PM 237 274163
    c007c87a-e030-4414-b741-9cf61e88b9de 4 11/7/2011 4:35:05 PM 451 387362
    c007c87a-e030-4414-b741-9cf61e88b9de 3 11/7/2011 4:45:06 PM 483 144999
    c007c87a-e030-4414-b741-9cf61e88b9de 2 11/7/2011 4:55:06 PM 734 293433
    c007c87a-e030-4414-b741-9cf61e88b9de 1 11/7/2011 5:05:06 PM 933 411485
    c007c87a-e030-4414-b741-9cf61e88b9de 0 11/7/2011 5:15:06 PM 247 209987

  2. There are also application events generated for administrative clients (Event ID 9826).  These stats represent 2 hours of activity:

    Starting from <date/time> service <name> has performed this activity on the server:
    RPC Operations: 24168.
    Database Pages Read: 1329 (of which 629 pages preread).
    Database Pages Updated: 12418 (of which 11555 pages reupdated).
    Database Log Records Generated: 13906.
    Database Log Records Bytes Generated: 660331.
    Time in Server: 19142 ms.
    Time in User Mode: 6100 ms.
    Time in Kernel Mode: 63 ms.

  3. The performance monitor counter “MSExchangeIS Client(*)\JET Log Record Bytes/sec” can be used to identify what client type is causing log growth.

I think all of us understand how critical it is to ensure that there is enough capacity to ensure that your database availability is not affected.  Hopefully this information helps in planning your transaction log capacity.

Ross Smith IV
Principal Program Manager
Exchange Customer Experience

Comments (8)
  1. Brendan Hamilton says:

    Do activesync devices cause more transaction logs to be generated?  I have found this to be in true in our environment.  I worked extensively with MS support but we never found a way to stop excessive log growth other than disabling activesync for a user.

  2. Jason W says:

    I think the reason why it's such a big problem is that DAG's have introduced much more complexity into planning for databases & backups that organizations are not prepared to properly plan out their new Exchange 2010 infrastructure.  you could write an entire book just on DAG planning and backups and Exchange Native Data Protection and some of the most common gotchas that companies need to think about when planning these things (like when and where to turn on CRCL and which copy do you backup if you do need point in time backups).  

    I know this stuff is covered under the technet article on understanding backups, restores, & DR; but considering that for a decade or more the question of how to handle log growth was as simple as backup (daily full, or weekly full and incremental's) or just turn on circular logging and hope for the best, it's no wonder that companies are having problems with this.  Now, even when you turn on circular logging on a db copy in a DAG, you can still run into a situation where your log files fill up your space and cause a dismount.  I'd imagine not many people realize that.

  3. Brian says:

    one of the challenges for many organizations is that things change.

    1.How many mailboxes will reside in the database? Changes over time

    2.What is the message profile of the mailboxes in the database?  Changes over time

    3.What is the average message size? Changes over time (usually not too much)

    4.What is the average mailbox size? Changes over time

    5.How many mailboxes are moved per day? Depends.

    6.What is the backup and restore solution? Usually static.

    7.Does the solution need to take into account any other failure scenarios, like network failures?  Always!

    So you capacity plan today and how often do you update that plan…when you run out of trans log space and get bit by it.

    A litigation hold or Single Item Recovery significantly changes this growth too…

  4. Kevin says:

    Easy solution… Do what we did and use Exchange Native Protection and get rid of backups(turning on circular logging).  Use cheap storage… 2TB 7200RPM SATA II disks

  5. @Brian – Your comments bring up a good point.  So first off, our guidance is is to design your Exchange deployment for the end state; in other words where you expect to be at the end of the hardware lifecycle. Now many times that may not be realistic as organizations change and can grow in size.  It's important to understand that with any IT solution, you have to do proper baselining and trending analysis to determine how the message profile evolves and adjust the design as appropriate.  This means that if you onboard a signficant number of users or the message profile significantly increases such that it can impact the capacity planning originally performed, then you need to re-architect the solution and make adjustments where necessary (e.g., add more disks and more databases, reducing the footprint per database).

    Ross

  6. Frank T says:

    What about if mailbox moves could be suspended if log space was below a certain threshold? ?

  7. @Frank – You can set the IsSuspendedFromProvisioning parameter via Set-MailboxDatabase.  This parameter specifies that the database is temporarily not considered by the mailbox provisioning load balancer (assuming you are not specifying the target database when executing New-MoveRequest.  See technet.microsoft.com/…/ff477621.aspx for more information.

    Ross

  8. Brian Hampson says:

    We got caught by the massive logs generated in moving a Mailbox DB.  Now, when we execute a full backup on a passive DB (Commvault), the logs don't get truncated..  All our capacity planning sort of thrown out the window if our logs just grow eternally.  Do ALL DB's need to be backed up before logs are truncated???  You are definitely right though.. logs make the world go 'round and without enough space, it's unhappy user time.

Comments are closed.