Jetstress 2010 and Background Database Maintenance

Last week a colleague from the MCM Exchange 2010 community asked some questions about disabling Exchange Server 2010 Background Database Maintenance.  Essentially they wanted to know if they disabled  the checkbox for BDM, what would happen on the passive database copies?

2011-06-06_1005

A response came back fairly quickly from Matt Gossage explaining that although the checkbox would disable 24 x 7 ESE scanning on the active database copies, it would not have any impact on the passive copies.  Essentially BDM is a hard coded process for all passive database copies, regardless of how the active copy is configured.

This started us thinking about a potential problem with Jetstress testing.  Jetstress has no real concept of a passive database copy, when you configure the number of copies in Jetstress we use that value to simulate the required read I/O that would be necessary in production, but we do not actually copy any data over the wire.  As a by-product of this Jetstress treats all configured databases as if they were active copies.  This is also why we recommend that when you are configuring your test that you account for all database copy types on each server, both active and passive (and lagged!).

…a quick example…

The following layout is a 4 node Database Availability Group, with 24 Databases and 2 copies.  It suggests that we require 6 active databases and 6 passive databases on each Mailbox Role server.  This means that each server will have 12 LUN’s and our Jetstress test will need to be configured as if it had 12 active databases.

DAG Example - SCalc

…where's the problem?

So… imagine the scenario where the storage fails the Jetstress test.  You have configured your test to have 12 databases of appropriate size but when you investigate the reason behind the failure, you discover that the storage system is unable to meet the throughput requirements for Background Database Maintenance (5MB/s per DB copy).  BDM IO is entirely sequential and usually the disk spindles are able to cope with this without problem, however in some circumstances the disk controller or interfaces can become the bottleneck for throughput in medium to large scale deployments.  Obviously in this simple example our BDM requirements are quite low (12 x 5MB/sec = 60MB/sec), but imagine this in a larger deployment with many more database copies per server.

After discussion with your storage and messaging teams you decide that the best way forward is to disable BDM and revert to maintenance window CRC checking.  This is how Exchange 2007 worked and you figure that this is the best compromise for your project.  Your databases are already <1TB in size and you calculate that you can provide a sufficient maintenance window to accomplish the scanning in an acceptable timeframe.  You revisit your Jetstress test and this time you decide to disable BDM scanning from your test…. the test passes and you carry on your project into production.

The problem here is that although your Jetstress test has validated that your have sufficient IOPS for 12 databases (6 active + 6 passive), you have not accounted for the BDM throughput requirements of your passive database copies.  Jetstress can only simulate active database copies and BDM in Jetstress is a global configuration affecting all databases.   If you disable BDM in Jetstress the test will always be missing the throughput I/O for passive database copies.

In this example, that would mean that your Jetstress test was missing 30MB/sec of BDM throughput.

… so what do I do about it?

The recommendation is to ensure that your storage is designed to adequately deal with BDM throughput requirements.  These requirements are predicted within the latest Mailbox Role Calculator so make sure that whatever storage solution you choose is capable or providing both the Total Database Required IOPS and Background Database Maintenance Throughput Requirements.

BDM

The bottom line is that we do not recommend that you disable BDM in your Jetstress test.  There is only one test scenario where disabling BDM in Jetstress is acceptable and that is if you are testing a mailbox role server that only hosts active mailboxes and you do not wish to have BDM enabled in production, if you are testing ANYTHING other than this scenario you MUST test with BDM enabled in Jetstress.

Neil Johnson [neiljohn@microsoft.com] Senior Consultant, Microsoft Consulting Services, UK