Backstage at the Server Sizing Concert

A common question we hear over and over again from customers sounds like a simple one: "How many and what kind of servers do I need to support X users?"  Have you ever wondered why it’s difficult to get a straight answer to this question?  After all, it sounds like a simple question.

Let's take these X users that you want to support.  How will they use exchange?  The first part in server sizing is understanding the types of users.  An executive, her assistant, a part time employee, and a salesperson don't have the same usage characteristics.

Lets create some simple categories for these users; light, medium and heavy.  What’s a light user?  What’s a medium user?  What’s a heavy user?  To address this, let's specifically define tasks (like send mail (how big, how many attachments, how many recipients, DL's?, how many users per DL, etc), read mail, delete mail, create appointment, create recurring appointment with some exceptions, view contact, create contact, and the other 100 or so tasks people commonly execute).  We'll need to assign frequencies for how often each of these types of users execute these tasks.  What should the frequencies be?  Well, we have 2 approaches to getting this data.  Install something to record user activity on the client or install something to record user activity on the server.  We could then mine the output and use some fancy statistics to determine what the frequencies should be.

Let's suppose you have identified all the tasks users execute, and assigned different frequencies to all the tasks to define what light, medium and heavy users are.  This is great… but next you need to know what percentage of your users are light, medium and heavy?  Well, how do you gather that information from your production environment in a non-intrusive, reliable, secure way, without revealing any confidential information?

Once you know all tasks users execute, how frequently light, medium and heavy users execute these tasks, and what percentage of your users are light, medium and heavy, you then need to take this defined workload and run it against multiple types of hardware and determine which hardware characteristics (cpu, memory, disk) suffice for the number of users in question.  But we’ll need a tool in order to generate the workload. Also, this tool needs to be consistent with the versions of the client that generate the load.  For example, if you have users on Outlook XP, the tool should simulate the load generated by Outlook XP.  If your users are all over the map, the tool would need to have the capability of simulating all the major clients currently deployed.  Plus, as new features are added to the clients that affect how the client uses the server, we'll need to update the tool accordingly (take for example cached mode or search folders in Outlook 2003).  Also, don’t forget about other client access methods such as Outlook Web Access, Exchange Active Sync, POP and IMAP, and so this tool would need to be updated to support those methods as well (create mail becomes create OWA mail, create Exchange Active Sync mail & send immediately (or not), create POP mail, etc) and we'll need to understand the frequencies of these new tasks and the fraction of our users that are light, medium and heavy for each access mechanism.  Are we having fun yet? 

So now we need to run the tool on various hardware configurations to see how many users can be supported.  The major pain here is due to the combinations of configurations.  For example, dual proc system, quad proc system, 8 proc system, does hyper-threading make a difference, 4gb ram, how about 16gb ram with PAE?, direct attached storage, storage array network, network attached storage, should we enable the write-back cache on the adapter for DB's, should we disable caching altogether for log file drives, do different vendors HBA’s or SCSI adapters have different performance characteristics...etc. All these tests will be time consuming and expensive – somebody’s got to pay for all that hardware after all.

Assume we've obtained some state of the art hardware to cover the different configurations, set up a test bed, built automation to run our tool over and over again finding how many users a particular configuration can support.  Is this correct?  Not completely.  Why not?  How will VSS or backup software affect the server (and for this we'll need to add the notion of "time" to our simulation, because backups are scheduled.  Now we need to understand our tasks and when they happen and modify our simulator accordingly)?  How about virus protection or spam software?  What about other third party applications?  Which third party applications?  What about public folder servers, front end servers and clusters (2-node, 4-node, 8-node and their different configurations)?  Will network latency affect performance?  The list goes on and on.   Furthermore, you’ll need to take into account the effect on the DNS server(s) and Active Directory server(s). 

So, "How many and what kind of servers do I need to support X user?" is not so simple after all.  We have executed on most of the items described above by providing Loadsim 2003 to simulate Outlook clients, ESP to generate SMTP, POP, IMAP, DAV, LDAP and NNTP load, performance whitepapers and performance tuning guidelines.  We are currently working to provide better, more accurate server sizing guidelines by refining our tools, building new automation, updating our definition of types of users, and testing on the latest and greatest hardware.  We do all this primarily to answer this #1 question for our customers.  Going through all this, we learn a tremendous amount about our users, tailor our product to their needs, push the envelope in terms of performance and scalability, and have quite a bit of fun in the process.

- Sam Khavari

Comments (3)
  1. Matt Drnovscek says:

    Hi There,

    ESP is a wonderfull tool for for OWA & now OMA load testing but the unfortunate thing is that you need a degree in WebDAV voodoo to create your own testing-scripts. If if you use canned scripts (I received some from MS consulting a while ago for ESP2000, which don’t work on ESP2003) most of the time they are difficult to setup. Has there been any discussion within MS to simpify ESP so mere mortals (not web development gurus) can use it?

    Many Thanks,


    P.S. Love the site – for some reason you always seem to post info on stuff I’m currently looking into/troubleshooting,etc…

  2. Chris says:

    Doesn’t everyone just design their systems with a mirrored OS and 168 drives for the databases?

  3. Sam Khavari says:

    Chris, you bring up an excellent point. The benchmark results do not reflect real world scenarios. There are many aspects of real world deployments that the benchmark does not reflect. As you mention, disk configuration is one of these aspects, and it’s one of the biggest aspects. This is exactly why we require the FDR to mention in bold type that the benchmark cannot be used as a deployment guide for production environments. The FDR then goes on to lists some of the other characteristics the benchmark does not account for.

Comments are closed.

Skip to main content