How to test the disks on your Exchange server


If there’s one thing that’s true of all busy Exchange servers, it’s that they generate massive amounts of disk I/O. There’s a joke around here that Exchange is the world’s biggest hard disk diagnostics program.

Typically, your disks will be the first component of your Exchange server that starts groaning as you add load. And, frequently, you’ll find that if you get your disks out of the redline area of the dial, that other performance issues suddenly heal themselves too. Why is this so?

Exchange databases use transactional logging. As new data comes in, the most urgent priority is getting the new stuff secured on disk in a log file. If you are experiencing “log stalls,” then everything else that needs to happen with that data must wait. This can lead to a cascade of other bottlenecks.

There is a very good KB article on log stalls. The article tells you how to use System Monitor to tell if you have a log stall problem, and how to tune Exchange if necessary:

XADM: Log Stalls/sec Are Regularly Greater than 0 (Zero)

http://support.microsoft.com/?id=188676

But all the fine tuning in the world won’t help if you are plain just demanding too much from your disk system. How can you tell what kind of load your disk system can really sustain?

For years, the Exchange database test team here at Microsoft has used a homegrown tool called Jetstress to simulate heavy disk I/O loads. It can be downloaded from:

http://www.microsoft.com/downloads/details.aspx?FamilyId=94B9810B-670E-433A-B5EF-B47054595E9C&displaylang=en

You may have used LoadSim in the past. JetStress has some similarities, but is not a replacement for LoadSim. JetStress is a more sharply focused tool than LoadSim. It is intended only to simulate Exchange disk I/O activity. LoadSim lets you simulate network and client activity, and thus indirectly works out the disk system. JetStress goes right at the disk system, no indirection about it.

You don’t even have to have Exchange installed to use JetStress. You simply copy a few files to a server and start pounding it to its limits. JetStress generates a test database from scratch, of whatever size you want. Typically, to get valid results, you only need to generate a database that is 5% the size of your intended real database. You can then tell JetStress to make the same changes to the database that happen during normal operation. It adds, deletes, replaces and reads records from the database. By using System Monitor, you can see how much real Exchange load your disks can handle. You can change your disk configurations and re-run the same tests to see what kind of difference it makes.

There are two basic kinds of testing you can do with JetStress:


  • Performance Testing
  • Disk Subsystem Stability Testing

We usually recommend that you let JetStress run at least 2 hours when you’re testing to see what kind of sustained throughput your disk system can handle. If you’re doing stability testing, the recommendation is 24 hours. Now, what exactly do I mean by stability testing?

Exchange can subject your server to very complex random I/O. As you push computer systems closer and closer to their tested limits, and as you run huge amounts of data through the system, you’re more likely to encounter glitches and even bugs in the ability of the system to reliably process and preserve data. JetStress will let you load your system up till it’s running as fast as it can, and will keep it under stress to see if it remains reliable in both storing and retrieving data.

The way you can tell if the system is performing reliably is to look for error -1018 from your Exchange database. This error occurs whenever a page is read from the database, and the checksum on the page is wrong. Every page in an Exchange database is checksummed as it is written, and the checksum is verified every time the page is looked at again. If even a single bit is wrong on the page, Exchange declares the page bad and reports a -1018 error. You can learn more about Exchange page checksums and how we detect corruption in the database in this KB article:

XADM: Understanding and Analyzing -1018, -1019 and -1022 Exchange Database Errors

http://support.microsoft.com/?id=314917

If your database has -1018 pages after the stress test, then the disk system cannot be considered reliable at the load level under which it was tested.

When you download the JetStress utility, you get excellent documentation along with it. The documentation will walk you through every phase, from setting up tests and monitoring their progress, to interpreting their results. It even tells you which System Monitor (Performance Monitor) counters to look at, and what values are OK. It also tells you how to validate the integrity of the database after a stability test.

Mike Lee

Comments (6)
  1. Snorrk says:

    Very nice. Insightful yet short.

  2. Colin Walker says:

    Thanks for the tip.

  3. Anonymous says:

    Exchange-faq.dk – Din portal til Microsoft Exchange Server information

  4. Donna Tatro says:

    Your pointer to XADM: Log Stalls/sec Are Regularly Greater than 0 (Zero)(http://support.microsoft.com/?id=188676) is helpful. That support article refers to Exchange 5.5. Does the same suggested registry change apply to an Exchange 2000 server?

    Thanks.

  5. Mike Smith says:

    Donna, In Exchange2000 there is no registry key to increase the log buffers – it is now done by modifying the ‘msExchESEParamLogBuffers’ value via the ADSIEDIT utility. The default in Ex2000 is 84, however you should manually set the value to 500 (it is not set to 500 in SP3 either).

    You should set the value on the ‘Information store’ & ‘storage group’ objects for the Exchange server via ADSIEDIT.

    Here’s how to do it:-

    1. Use ADSI Edit to connect to the Configuration Container Naming Context of your Active Directory.

    2. Go to the following path:

    Configuration Container | CN=Information Store,CN=<server>,CN=Servers,CN=<Admin Group>,CN=Administrative Groups,CN=<org>,CN=Microsoft Exchange,CN=Services,CN=Configuration

    3. Right-click the Information Store object, and then click Properties.

    4. Change the Select which properties to view drop-down list box to Both.

    5. Select the msExchESEParamLogBuffers attribute and type in the value of 500. Although no value will be present, the default will be 84.

    6. Remember to click Set after changing the edit field for the attribute.

    7. Now (similar steps as 3 – 6) set the msExchESEParamLogBuffers attribute for the individual Storage Group object(s) below the Information Store attribute of the server.

    7. Close the ADSI Edit tool by closing the MMC console application.

    8. Wait for Active Directory replication to replicate this new value throughout the forest.

    9. Restart the Information Store service on the Exchange 2000 server.

  6. Mike Lee says:

    Thanks, Mike Smith, for answering the question about setting log buffers in Exchange 2000/3, and my apologies to Donna for referencing an obsolete KB article. An updated article on setting log buffers for Exchange 2000/3 can be found here:

    http://support.microsoft.com/?id=328466

    Something else to keep in mind when setting the log buffers is that Exchange rounds down whatever you use as a value to the nearest value evenly divisible by 128, with a minimum of 128. Therefore, if you set 500 as the buffer size, the actual number of buffers will be 384. If you set 512, the number of buffers will be 512.

    Why then does Exchange set the number to 500 by default? It’s a bug, but not a very important one. The functional difference between 384 and 512 buffers is usually negligible.

    There have been recommendations made in the past to set the log buffers to 9000. This recommendation does not apply to Exchange 2000 SP3 or later, though there’s not much harm in it–you’re just wasting some memory.

    Each log buffer is 512 bytes in size and buffers one log sector. The maximum value you can set for the buffers is 10240, which will buffer an entire log file (each log file has 10240 sectors in it: 10240 x 512 = 5 MB, which is the file size of a log file).

    As a general rule, when playing with the log buffers, start with 512. If your log stalls (mostly) go away from this setting, leave it there. If not, you can increase the buffers in multiples of 128 until the log stalls are relieved. If you get to the max of 10240, and you’re still having log stalls, then the next likely suspect for why this is still happening is a disk I/O bottleneck.

    FYI, if you have restored from online backup, and you are replaying a large number of transaction log files (thousands of them), you may increase the speed of replay by temporarily changing both the msExchEseParamLogBuffers (to 10240) and msExchEseParamCacheSizeMax (to 307200). Don’t forget to set them back to their previous values after you’re finished, especially the Cache. If you leave the ESE cache that large (307200 x 4K = 1.2 GB), you may start running out of virtual address space. For more information about memory and buffer tuning for Exchange see this article:

    http://support.microsoft.com/?id=815372

Comments are closed.