A study of Exchange 2007 SP1 Hub throughput with different message sizes


EDIT 8/6/2008: Please see this post for additional information related to Hub throughput and different message sizes.

Large message size: effect of transport database cache size on throughput.

Background

Recently one of our support engineers came to us requesting performance data for a client deploying Exchange 2007 SP1 (E2K7 from now on).

The client wanted to know what level of steady state throughput was achievable by a Hub Transport server receiving 4 widely different average message sizes:

  • 25KB
  • 1MB
  • 5MB
  • 10MB

We had some of the data but needed to complete the table, so we employed the test bed used to measure transport performance for E2K7 and E2K7 SP1.

Test Platform and Server Configuration

Hub Hardware: 2 processors x 2 core, 2.2 GHz, 800 MHz FSB, 1MB L2 cache per core, 4 GB RAM, 400 MHz memory, Ultra 3 SCSI disk controller ("entry level") with 128 MB Write-Back Cache, 3 x Ultra320 Universal SCSI 15K RPM disk.

Optimized E2K7 transport database queue configuration:

  • 1 disk for DB logs,
  • 1 disk for DB queue file,
  • 1 disk for OS and other transport logs: Message Tracking, Connectivity, Agent logs, etc.

Transport dumpster is not being used in this environment: 1 Hub and 1 Mailbox without replication.

Mailbox Hardware: A "good" Mailbox server with enough CPU cycles and storage bandwidth to accept message delivery without slowing down the Hub.

Gigabit network.

Test Description and Results

The battery of tests was based on the benchmarking automation we used during Exchange 2007 development, changing the "message mix" for each test to inject a different average message size (25KB, 1MB...).

The benchmarking infrastructure is designed to inject messages into transport through a SMTP receive connector, at constant speed, seeking a steady state throughput, while monitoring baseline performance counters.

Ideally the test stabilizes after few minutes of "warm up" flow when the DB cache reaches a stable size (128MB if using the default DatabaseMaxCacheSize setting). Steady state is achieved by looking at:

  1. Throughput counters are roughly constant, with at most a 5% oscillation.
  2. Queue length stays low (see table of results for values)
  3. Transport DB cache size (DatabaseMaxCacheSize) reaches a constant value.

Yes, I said "ideally," but sometimes the test doesn't stabilize: throughput oscillates reaching 0 frequently or a queue builds-up (Remote, Delivery or Submission queues).

Then you have to work a bit to understand why. Start the investigation by looking at the server EventLogs.

One possibility is heavy resource pressure in which case transport decides to apply back pressure on the system, indicated by Event Log ID 15004. Looking at the event you will find details on what resource is under strain. You can see an example of this in the 3rd test of the suite shown below.

Then you have to diagnose why server went into backpressure, like 3rd test did below. At the end of the post you'll find some more data what to look for when analyzing performance bottlenecks.

Table of Results

Cache Size

128MB DB Cache

512MB DB Cache

Limiting Resource

Test 1

CPUBound

Test 2

IOBound

Test 3

Configuration Bound*

Test 4

IOBound

Test 5

IOBound

Message Size

25KB

1MB

5MB

5MB

10MB

SMTP Receive Throughput (msg/sec)

159.32

14.05

0.40

2.03

1.34

Aggregate Queue length (MAX)

329

63

29

27

2

Queue size in MB (MAX)

8.65

64.51

148.48

138.24

20.48

%CPU

69.86

56.03

15.37

48.00

40.03

Msg Cost (MCyc/msg)

38.68

351.07

3131.76

2068.69

2591.67

Msg Cost (MCyc/ByteOfMsg)

1470.72

342.84

611.67

404.04

253.09

Disk Writes/sec (log)

92.80

185.00

133.00

181.00

Disk Writes/sec (queue)

35.30

729.00

876.00

622.00

Disk WriteKB/sec (log)

9,876

32,800

23,279

30,796

Disk WriteKB/sec (queue)

1,086

23,900

19,788

23,556

Disk Writes/msg (log)

0.58

13.17

64.53

138.17

Disk Writes/msg (queue)

0.22

51.89

425.04

474.81

Disk WriteKB/msg (log)

61.99

2,335

11,295

23,508

Disk WriteKB/msg (queue)

6.82

1,701

9,601

17,982

Disk Reads/sec (log)

0.00

0.00

0.00

0.00

0.00

Disk reads/sec(queue)

0.00

0.00

567

0.00

0.00

*Back pressure, High Version Buckets: Event Log ID 15004

Analysis of Results

In the 3rd test, with transport service rapidly transitioning on and off from back pressure, disk counters show a heavily serrated pattern; therefore averages are not computed accurately by perfmon. In this case the inaccurate values were left out of the chart.

Nevertheless, throughput on that test is computed by the following ratio: (Total Messages Received)/(Test Duration), so it's accurate. See below for summary data that compares the two 5MB runs.

After testing the first 2 message sizes (25KB and 1MB), we couldn't reach steady state throughput on the 3rd and 4th test with default server settings.

Attempting to inject steady flow of the large messages (5MB) triggered back pressure, with the well known Event Log ID 15004, claiming version buckets are above high watermark.

The first suspect to examine when version buckets are high is disk I/O performance. We immediately discovered that the flow of large messages contributes to a large queue length. In this case, the queue "only" contained 29 messages, but with the large message size being received this translates to 149MB on the queue overflowing the database cache default size of 128MB.

In the table above, notice that the queue size (in MB) never approached the DB cache size in previous. Looking at the disk counters we found that the overflowing of the cache triggered a large amount of disk reads, which don't appear in the regular steady state tests.

To avoid overflowing the cache and triggering back pressure, we decided to experiment with increasing the transport DB cache size. Initially we tested with a 1GB cache, but found that 512MB (up from the default 128MB) was enough to eliminate the overhead of additional disk reads associated with the flow of very large messages.

Here is a fragment from the EdgeTransport.exe.config file that shows the changes made:

<configuration>
<runtime>
<gcServer enabled="true" />
</runtime>
<appSettings>
<!-- Optimized Transport DB storage -->
<add key="QueueDatabasePath" value="e:\data\"/>
<add key="QueueDatabaseLoggingPath" value="c:\logfiles\"/>
....
<!-For very large message test: commented default 128M DB Cache -->
<!-- add key="DatabaseMaxCacheSize" value="134217728" / -->
<!-Using 512M DB Cache: -->
< add key="DatabaseMaxCacheSize" value="536870912" />
...
</appSettings>

Additionally, here are a few more interesting statistics for the test that triggers back pressure: 31% of the time server is not receiving messages; the throughput for the non back pressure windows is only 0.57 msg/sec, compared to steady 2.03 msg/sec when back pressure is avoided by using a bigger DB cache.

5MB message size stats â?? Back pressure vs. Steady state

 

 

Database Cache size (MB)

 

 

128

 

 

512

 

 

Duration (min)

 

 

20

 

 

20

 

 

Total Messages Received

 

 

475

 

 

2436

 

 

# of Transitions into back pressure

 

 

41

 

 

0

 

 

Total Minutes in back pressure mode.

 

 

6.17

 

 

0.00

 

 

% of Time in back pressure

 

 

31%

 

 

0%

 

 

Max back pressure Windows (sec)

 

 

65

 

 

0

 

 

Average Throughput (msg/sec)

 

 

0.40

 

 

2.03

 

 

Throughput for the non back pressure intervals

 

 

0.57

 

 

2.03

 

 

Bill Thompson, from the Exchange Center of Excellence, on his New maximum database cache size guidance for Exchange 2007 Hub Transport Server role blog post has the official guidance on the DatabaseMaxCacheSize settings to use.

A disclaimer: storage is key for transport performance, all the above data only applies to a Hub server with at least an "entry level" SCSI controller with 128 MB of BBWC (battery backed write-back cache) that optimizes the IO pattern transport performs on steady state flow: continuous writes with very few or no reads.

Performance Counters

Some useful counters when doing E2K7 transport benchmarking:

1. Throughput counters

MSExchangeTransport SmtpReceive(_total)\Average bytes/message
MSExchangeTransport SmtpReceive(_total)\Messages Received/sec
MSExchangeTransport SmtpSend(_total)\Messages Sent/sec
MSExchange Store Driver(_total)\Inbound: MessageDeliveryAttemptsPerSecond
MSExchange Store Driver(_total)\Inbound: Recipients Delivered Per Second
MSExchangeTransport Queues(_total)\Messages Queued for Delivery Per Second
MSExchangeTransport Queues(_total)\Messages Completed Delivery Per Second

2. Queue counters, others

MSExchangeTransport Queues(_total)\Aggregate Delivery Queue Length (All Queues)
MSExchangeTransport Queues(_total)\Active Remote Delivery Queue Length
MSExchangeTransport Queues(_total)\Active Mailbox Delivery Queue Length
MSExchangeTransport Queues(_total)\Submission Queue Length
MSExchangeTransport DSN(_total)\Failure DSNs Total
MSExchangeTransport Dumpster\Dumpster Size
MSExchange Database(edgetransport)\Database Cache Size (MB)
MSExchange Database(edgetransport)\Version buckets allocated

3. Accessory counters to diagnose if CPU, Disk bound, Network, see Bottleneck-Detection Counters

PhysicalDisk(_Total)\Current Disk Queue Length
PhysicalDisk(_Total)\Disk Writes/sec
PhysicalDisk(_Total)\Disk Reads/sec
PhysicalDisk(_Total)\Avg. Disk sec/Write
PhysicalDisk(_Total)\Avg. Disk sec/Read
Processor(_Total)\% Processor Time
Process(Edgetransport)\% Processor Time
Process(Edgetransport)\Private Bytes
Memory\Available MBytes
Network Interface\ Bytes Total/sec
.....

Future Testing

If you're wondering how the results differ for other average sizes, we'll be posting more data on some other sizes (40KB, 70KB) later, so stay tuned.

We are currently testing servers with different storage: SATA disk, 7200 RPM, without the advantage of BBWC. More data on this scenario will be coming in a future blog post.

Eli­as Kaplan

Comments (5)
  1. Dustin Lema says:

    Teriffic post!  Perhaps you can benchmark with SAS and U320 controllers as well?

  2. Elias says:

    Thanks Dustin for the comment.

    The next post I’m working on is for lower end storage, SATA disk, helping to drive a lower TCO on data center deployment.

    For higher end storage data you may ping Bill Thomson from the ECOE team, see linked post.

  3. BK says:

    Good information as we’re moving to E2K7 environments.  Appreciate it.

  4. elias:

    where’s the link of Bill Thomson ?

  5. Elias says:

    Hi Derek

    It’s linked in the doc, New maximum database cache size guidance for Exchange 2007 Hub Transport Server role : http://msexchangeteam.com/archive/2008/05/14/448890.aspx

    Elias

Comments are closed.

Skip to main content