Ask the Perf Guy: How big is too BIG?


We’ve seen an increasing amount of interest lately in deployment of Exchange 2013 on “large” servers. By large, I mean servers that contain significantly more CPU or memory resources than what the product was designed to utilize. I thought it might be time for a reminder of our scalability recommendations and some of the details behind those recommendations. Note that this guidance is specific to Exchange 2013 – there are many architectural differences in prior releases of the product that will impact scalability guidance.

In a nutshell, we recommend not exceeding the following sizing characteristics for Exchange 2013 servers, whether single-role or multi-role (and you are running multi-role, right?).

Recommended Maximum CPU Core Count

24

Recommended Maximum Memory

96 GB

Note: Version 7.5 and later of the Exchange Server 2013 Role Requirements Calculator aligns with this guidance and will flag server configurations that exceed these guidelines.

As we have mentioned in various places like TechNet and our Preferred Architecture, commodity-class 2U servers with 2 processor sockets are our recommended server type for deployment of Exchange 2013. The reason for this is quite simple: we utilize massive quantities of these servers for deployment in Exchange Online, and as a result this is the platform that we architect for and have the best visibility into when evaluating performance and scalability.

You might now be asking the fairly obvious follow up question: what happens if I ignore this recommendation and scale up?

It’s hard, if not impossible, to provide a great answer to this question, because there are so many things that could go wrong. We have certainly seen a number of issues raised through support related to scale-up deployments of Exchange in recent months. An example of this class of issue appears in the “Oversizing” section of Marc Nivens’ recent blog article on troubleshooting high CPU issues in Exchange 2013. Many of the issues we see are in some way related to concurrency and reduced throughput due to excessive contention amongst threads. This essentially means that the server is trying to do so much work (believing that it has the capability to do so given the massive amount of hardware available to it) that it is running into architectural bottlenecks and actually spending a great deal of time dealing with locks and thread scheduling instead of handling transactions associated with Exchange workloads. Because we architect and tune the product for mid-range server hardware as described above, no tuning has been done to get the most out of this larger hardware and avoid this class of issues.

We have also seen some cases in which the patterns of requests being serviced by Exchange, the number of CPU cores, and the amount of physical memory deployed on the server resulted in far more time being spent in the .NET Garbage Collection process than we would expect, given our production observations and tuning of memory allocation patterns within Exchange code. In some of these cases, Microsoft support engineers may determine that the best short-term workaround is to switch one or more Exchange services from the Workstation Garbage Collection mode to Server Garbage Collection mode. This allows the .NET Garbage Collector to manage memory more efficiently but with some significant tradeoffs, like a dramatic increase in physical memory consumption. In general, each individual service that makes up the Exchange server product has been tuned as carefully as possible to be a good consumer of memory resources, and wherever possible, we utilize the Workstation Garbage Collector to avoid a dramatic and typically unnecessary increase in memory consumption. While it’s possible that adjusting a service to use Server GC rather than Workstation GC might temporarily mitigate an issue, it’s not a long-term fix that the product group recommends. When it comes to .NET Garbage Collector settings, our advice is to ensure that you are running with default settings and the only time these settings should be adjusted is with the advice and consent of Microsoft Support. As we make changes to Exchange through our normal servicing rhythm, we may change these defaults to ensure that Exchange continues to perform as efficiently as possible, and as a result, manual overrides could result in a less optimal configuration.

As server and processor technology changes, you can expect that we will make adjustments to our production deployments in Exchange Online to ensure that we are getting the highest performance possible at the lowest cost for the users of our service. As a result, we anticipate updating our scalability guidance based on our experience running Exchange on these updated hardware configurations. We don’t expect these updates to be very frequent, but change to hardware configurations is absolutely a given when running a rapidly growing service.

It’s a fact that many of you have various constraints on the hardware that you can deploy in your datacenters, and often those constraints are driven by a desire to reduce server count, increase server density, etc. Within those constraints, it can be very challenging to design an Exchange implementation that follows our scalability guidance and the Preferred Architecture. Keep in mind that in this case, virtualization may be a feasible option rather than a risky attempt to circumvent scalability guidance and operate extremely large Exchange servers. Virtualization of Exchange is a well understood, fairly common solution to this problem, and while it does add complexity (and therefore some additional cost and risk) to your deployment, it can also allow you to take advantage of large hardware while ensuring that Exchange gets the resources it needs to operate as effectively as possible. If you do decide to virtualize Exchange, remember to follow our sizing guidance within the Exchange virtual machines. Scale out rather than scale up (the virtual core count and memory size should not exceed the guidelines mentioned above) and try to align as closely as possible to the Preferred Architecture.

When evaluating these scalability limits, it’s really most important to remember that Exchange high availability comes from staying as close to the product group’s guidance and Preferred Architecture as possible. We want you to have the very best possible experience with Exchange, and we know that the best way to achieve that is to deploy like we do.

Jeff Mealiffe
Principal PM Manager
Office 365 Customer Experience

Comments (2)
  1. John Barsodi says:

    I understand that we are looking at preferred architecture(PA)…many of us rely on the Exchange calc to help with design and sizing, but the Exchange 2013 Calc doesn’t approach it following the PA way necessarily. Would it be possible for the calculator
    to have another scenario tab that takes the requirements, excluding server counts, and produce a design based on PA? In other words if the calc spits out a single site 3 node DAG to house 7k users with 48 cores and 192GB, could it produce an alternate config
    based on PA for comparison?

  2. John Barsodi says:

    …and I should have gotten further down in today’s posts to see the updated calc now warns you.

Comments are closed.