New maximum database cache size guidance for Exchange 2007 Hub Transport Server role


Summary

Exchange 2007 SP1 has a default transport database cache size of 128MB. This size doesn't allow as much dynamic growth as may be necessary on Hub servers with higher than normal message rates, or when an unexpected load from increased message size comes into a server. To better allow for cache growth, the new guidance is to increase the DatabaseMaxCacheSize value from 128MB to 512MB on Hub servers with 4 GB or more memory installed.

Background

In Exchange 2007, the transport service utilizes the Extensible Storage Engine (ESE) for mail transport functionality. This provides several benefits over previous versions that used the NTFS file system:

  • Enhances performance by writing transactions first to log files and memory, and then to the database file
  • Increased transactional integrity of data stored in the queue
  • All mail queues are held in a single location, the transport mail queue database. In Exchange 2003 mail could be two locations during processing: the file folder structure and the local information store.

The essentials of ESE are covered in http://msexchangeteam.com/archive/2007/11/30/447640.aspx and http://technet.microsoft.com/en-us/library/bb331958.aspx. ESE utilizes a transactional architecture and follows this process:

  1. An operation occurs against the database (e.g. a new message is received), and the page that requires updating is read from the file and placed into the ESE cache (if it is not already in memory), while the log buffer is notified and records the operation in memory.
  2. The changes are recorded by the database engine but are not immediately written to disk; instead, these changes are held in the ESE cache and are known as "dirty" pages because they have not been committed to the database. The version store (also referred to as version buckets) is used to keep track of these changes, thus ensuring isolation and consistency are maintained.
  3. As the database pages are changed, the log buffer is notified to commit the change, and the transaction is recorded in a transaction log file (which may or may not require a log roll and a start of a new log generation).
  4. Eventually the dirty database pages are flushed to the database file.
  5. The checkpoint is advanced.

Unlike the Mailbox Server role, the transport service does not dynamically grow the ESE cache. Instead the ESE cache is capped at 128MB in Exchange 2007 SP1 and earlier. This value is known as DatabaseMaxCacheSize and is specified in the EdgeTransport.exe.config file. The Exchange 2007 resource monitor keeps track of how many used version buckets are currently sitting in memory. When the amount of used version buckets in memory exceeds the thresholds specified in the EdgeTransport.exe.config file of each Hub server, the resource monitor invokes backpressure and logs Event ID 15004 to indicate that the server is experiencing resource pressure. A resource pressure event is a staged process where a Hub server will first prevent new inbound SMTP messages, and then when the next threshold is reached it will prevent new Mailbox server connections in an attempt to clean uncommitted transactions out of memory and into the queue database file. There are a number of causes to this behavior, including large messages of an increased message load. The version buckets thresholds have been changed in Exchange 2007 Service Pack 1, with an increase to the default RTM values. The new values are 120 for the medium threshold, and 200 for the high threshold.

Note: the version buckets threshold value should not be set higher in an attempt to resolve resource pressure, as this will likely have a negative impact on the server availability.

For more information on how backpressure is configured and applied, please see "Understanding Back Pressure" (http://technet.microsoft.com/en-us/library/bb201658.aspx.)

New Guidance

To increase the performance with version buckets, and to better allow for cache growth, the new guidance is to increase the DatabaseMaxCacheSize value from 128MB to 512MB on Hub servers with 4 GB or more memory installed.

The following guidance should be followed:

Current setting:

      <add key="DatabaseMaxCacheSize" value="134217728" />

New recommended setting:

      <add key="DatabaseMaxCacheSize" value="536870912" />

Why Change this Guidance?

Increasing the cache size allows ESE to dynamically manage the cache size based on memory pressure. With the default limit of 128MB, ESE could not effectively grow the cache which in turn increased disk activity and delaying the flushing of data to the database. As a result, the version buckets could not be flushed fast enough, resulting in back pressure notification. When the increase in version buckets causes back pressure events, mail could queue up on the Hub server, and also queue up on remote SMTP server attempting to deliver inbound to the Hub server. In some environments this would cause a cascading effect that would require more than just the single Hub server to recover from the denied connections. Increasing the database cache size should help eliminate version buckets backpressure events.

However, this change will not eliminate version buckets backpressure events where slow performing disks are the root cause of the backpressure event. To ensure that your disk storage subsystem is properly designed, please review the Transport Server Storage Design article,

http://technet.microsoft.com/en-us/library/bb738141(EXCHG.80).aspx.

There is also an added potential performance gain when the transport dumpster size is less than database max cache size, resulting in fewer database read IOPS. IOPS is a key metric for storage sizing guidelines, and is calculated as the amount of database input/output (I/O) per second (IOPS) consumed by each mail item. In one of our tests with 12 storage groups there was a 57 percent decrease in I/Os per message. This substantial performance gain was seen in a lab using the default MaxDumpsterSizePerStorageGroup of 18MB, resulting in a max dumpster size of 216MB which is greater than DatabaseMaxCacheSize default of 128MB, but less than the recommended DatabaseMaxCacheSize of 512MB. More information on the test hardware and configuration:

  • Exchange 2007 Service Pack 1
  • Windows 2003 Service Pack 2
  • 8 processor cores with 16GB of memory
  • Forefront Anti-Virus with 5 engines, max certainty
  • 21 messages per second, of approximately 46 KB average message size
  • Logs and Database were on separate, dedicated drives

The follow results were seen in that test.

Hub Transport server database I/O (~21 msg/sec)

 

 

SP1 Default Cache

 

 

SP1 Modified Cache

 

 

Total IOPS per message

 

 

26.71

 

 

11.67

 

 

Log write I/Os per message (sequential)

 

 

3.44

 

 

3.28

 

 

Database write I/Os per message (random)

 

 

11.69

 

 

7.13

 

 

Database read I/Os per message (random)

 

 

10.48

 

 

0.42

 

 

In the internal Microsoft IT (MSIT) environment, where there is an Active Directory site with ~800 storage groups with a transport dumpster size of 15MB per storage group, we only observed a 15% decrease in I/Os per message after the increase to the database cache size. This decrease in IOPS per message was less than that observed in the lab due to the total size of the transport dumpster. In this case, the increased cache size resulted in increased availability of the transport service because back pressure due to version buckets was eliminated.

A possible conclusion from this testing may lead an administrator to wonder if going beyond the new 512MB value gives even more added benefit. Testing with MSIT HUB servers with 8GB total memory, setting a cache size of 2GB has not shown a significant increased benefit over 512MB. If you are deploying servers with 12GB or more you should conduct testing in your environment to assess the benefit of going beyond 512MB. However, this is not recommended and has not been tested in a production environment.

More Information

For more information, please see the following:

Exchange 2007, Managing Shared Transport Database Configuration Options. http://technet.microsoft.com/en-us/library/bb232166.aspx

Exchange 2007, Understanding Back Pressure.
http://technet.microsoft.com/en-us/library/bb201658.aspx

Bill Thompson

Comments (10)
  1. Ben says:

    That’s definitely a nice thing to know. What about Edge Transport Servers ?

  2. Sankar Ramesh says:

    # Exchange 2007 Service Pack 1

    # Windows 2003 Service Pack 1

    How did you get E2k7 SP1 to install on a Windows 2003 SP1 server?  Or is that a typo?

  3. cannedsoda says:

    Same question as Ben.  What about Edge Transport servers?

  4. Clifton2 says:

    This is great!…Chris.

  5. Eric Sabo says:

    Is there a technet article for these settings?  Do you need a restart for them to take affect?

  6. Exchange says:

    Sankar Ramesh – thanks, this was a typo, now fixed.

    Eric Sabo – we are working on updating our TechNet documentation, but wanted to let everyone know as soon as possible so we posted to the blog first.

  7. bill says:

    For Edge Transport Servers, there really isn’t as much pressure on the database cache.  This is especially true when comparing the additional overhead of the transport dumpster.  However, there is no negative consequence from changing this setting on Edge server assuming you have at least 4gb of memory.

    After changing the setting, only the Transport service needs to be restarted.

    Thanks!

  8. Brian A. says:

    Do these recommendations apply to Exchange 2007 SP1 servers that have more than just the Hub Transport Role installed (Hub, CAS, and Mailbox or Hub and CAS)?

  9. bill says:

    Yes, this change applies to every Hub server with at least 4gb of RAM installed.  However, following the published guidance if you have a multi-role server the recommended memory configuration is 8gb, plus a per mailbox consideration if you also have the Mailbox role installed.  

    For more guidance see: http://technet.microsoft.com/en-us/library/bb738124(EXCHG.80).aspx

       

  10. Will this guidance end up in an update of the Best Practice Analyzer tool?

Comments are closed.

Skip to main content