Checklist for Troubleshooting Performance Related issues in Exchange 2013, 2016 and 2019 (on-prem)

The_Exchange_Team · ‎May 31 2016

Note: a major update to this post (newer links, tools etc.) was completed on 2/5/2020.

Some of relatively common and difficult issues we see in support are related to client connectivity to Exchange. There are several variations that we classify as connectivity (related to server performance or otherwise). They can include:

Clients prompting for credentials (intermittently or continuously)
Clients getting disconnected
Clients are unable to establish a connection
Clients freezing or going unresponsive
ActiveSync clients experience delays receiving new mail

There are many factors that can contribute to these symptoms, and each one can lead down a completely different troubleshooting path. In this post we will focus on methods and tools to troubleshoot these. There are too many potential underlying causes to cover them all but hopefully this will serve as a good starting point for troubleshooting.

This post is not meant to be a post that you necessarily read from start to finish, but rather serve as a guidance for when you need to troubleshoot hard to find issues related to client connectivity. It can be a bit overwhelming, sure, but depending on the depth that you need to get into – it might be all worth it!

In every case, it's best to spend some time understanding what the user is experiencing. Understanding the full client experience and effect can help determine the logging and troubleshooting path. When you define the users issue, you should also run the, Office 365 Support and Recovery Assistant (aka SaRA) on the client to have a good view of configuration that can impact their experience. To obtain the configuration information, select “Advanced Diagnostics” click “Next” and chose “Outlook” and follow the prompts to obtain the report.

Note: If you are an Office 365 customer with users having mailboxes in Exchange Online, we highly recommend using Office 365 Support and Recovery Assistant to troubleshoot Outlook connectivity and other common issues to Office 365 since this post is for On-Premises.

Outlook client configuration

Here are some items to consider on the client side when troubleshooting these issues.

Are clients online or in cached mode? Cached mode is recommended. Moving critical users to cached mode to improve their experience is good idea. You can control the amount of mail users download using the “Sync Slider”. Additionally, you can enable BitLocker if there are local storage security considerations.
What devices are clients traversing to hit the Exchange server? Run Tracert to the Exchange server to determine the devices in the path of the client.

Test several clients with a hosts file pointing to the IP address of Exchange using the external host name. The Hosts file is in C:\Windows\System32\Drivers\etc directory.
What is the Outlook full build number? Always make sure that you are working with an updated client. Information about recent updates can be found in the following references:

MSI Builds: Outlook and Outlook for Mac Update File Versions
C2R Builds: Update history for Office 365 ProPlus (listed by date)

Using the SaRA Advanced Diagnostics logging discussed earlier, examine which add-ins are being used and consider disabling those for testing. Make sure that you restart Outlook and verify the add-ins remain disabled.
Try running Outlook in safe mode for a few clients. Not all add-ins are removed when using safe mode so ensure you have verified they are no longer loaded with Process Explorer if you suspect they are still loaded.
Enable Outlook logging discussed later in the document. Use caution since logging can also have a performance impact to the client.

Exchange configuration

The first step for the Exchange server is to verify that the preferred architecture is in place by reviewing the related version documentation.

Several items such as setting power management to “High Performance” and verify the OS isn't turning off power to the NIC. Making sure the server is running updated and supported .NET version, and so on are extremely important:

Run the Exchange Performance Health Checker script. Review the output for any results that must be updated. Some of the recommended changes that we require are also listed below.
Verify most recent supported version of .NET is installed. Please see the Supportability Matrix for the current version supported.
Set the paging file minimum and maximum value to the same size. In Exchange 2016 the value should be 32 GB +10MB, unless you have less than 32 GB of RAM, then use the same value of RAM installed plus 10MB. On Exchange 2019, the paging file minimum and maximum should be set to 25% of installed memory.
Turn off hyper-threading
Update outdated Outlook clients
Update NIC drivers.
Eliminate any slow disk issues identified in Perfmon. The threshold counters are, “LogicalDisk\Average Disk sec/Write” and “LogicalDisk\Average Disk sec/Read” should be within 20ms average values without any spikes. Without getting to deep into this, spikes can be ok however if they are too high, they can cause issues. Consider working with CSS support teams if there are any spikes in the disk performance.
Power Settings: Set Power Management to “High Performance”
If you are seeing high CPU utilization, review the value of the performance counter “Processor Information (_total)\% Processor Performance”. As mentioned in the description of that counter: “Some processors are capable of regulating their frequency outside of the control of Windows. Processor Performance will accurately reflect the performance of these processors.” In cases where you are seeing this value at less than 100%, it may be the case that a server BIOS/EUFI setting is allowing the CPU to run at less than maximum performance even though the OS is set to High Performance. Please check with your server hardware vendor for the proper settings that would allow “Maximum Performance”.
Locate the NIC “Properties |Configure | Power Management” and verify the box for “Allow the computer to turn off this device to save power” is unchecked.
Max Core Count - In Exchange, you can encounter performance problems if you go too far off the preferred architecture, particularly when referring to core count. This can even include having too many cores. The maximum number of cores on a server should be no more than 24 in Exchange 2016 and 48 in Exchange 2019. Hyper-threading should be disabled it as mentioned previously. For more information, refer to following articles:
- Troubleshooting High CPU utilization issues in Exchange 2013
- Ask the Perf Guy: Update to scalability guidance for Exchange 2016

Exchange TCP KeepAliveTime should be set to 30 minutes or no less than 15 minutes. If there's no entry in the registry for KeepAliveTime then the value is 2 hours. This value, if not set correctly, can affect both connectivity and performance. You must make sure that the load balancer and any other devices in the path from client to Exchange be set correctly. More information on load balancer configuration in the Load Balancing Configuration section below. The goal is to set Exchange with the lowest value so that client sessions when ended, are ended by the Exchange and not by a device.

Path: HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters\ Value name: KeepAliveTime Value Type: 1800000 (30 minutes, in milliseconds) (Decimal)

Load balancer configuration

Please note that for 3rd party load balancer configuration, you should always refer to product documentation / guidance. The following are some general best practices and things we see misconfigured:

Verify the client TCP idle time-out is a slightly larger value than the Keep Alive setting on Exchange, as noted earlier. Each vendor uses different terminology and could be listed as client time-out, or simply timeout.

Clients > Firewall > Load Balancer > Exchange

In this example, if you have a firewall in the path from client to Exchange, we are referencing the firewall “idle” time out and not the persistence time out. This value should be greater than the load balancer and the load balancer time out should be greater than Exchange. Note that it is not recommended to go below 15 minutes for Keep Alive on Exchange or TCP idle timeout on the load balancer.

Firewall time out = 40 minutes
LB TCP Idle time out = 35 minutes
Exchange Keep Alive = 30 minutes

If the load balancer supports it, the preferred option is to configure it to use “Least Connections” with “Slow start” during typical operation.

With the "least connections" method, be mindful it's possible for a server to become overloaded and unresponsive during a server outage or during patching maintenance. In the context of Exchange performance, authentication is an expensive operation.

Additional troubleshooting steps

If you have completed the previous steps and you are still experiencing issues, the following data is necessary:

Start Perfwiz on Exchange and Mailbox server to run for up to 4 hours during the busiest part of the day (assuming this is when issues are happening). Download ExPerfwiz from here. Example command line: \experfwiz.ps1 -server MBXServer -interval 5 -filepath D:\Logs. Once you have the performance data, see blog on Troubleshooting High CPU Utilization issues in Exchange 2013
Review the application logs for any 4999 events that occurred around the time of problems.
In the Application log, review the 2080 event to make sure that all domain controllers are responding with correct Boolean values. If there are any responses that are not accurate, the DC’s should be repaired or excluded.

Expected values are“CDG 1 7 7 1 0 1 1 7 1” as shown in the following table.

For testing, a DC can be excluded by using the Set-Exchangeserver –StaticExcludedDomainControllers parameter as shown in this section, however, troubleshooting Global Catalog access should also be done as soon as your testing is completed. Statically excluding a GC takes effect immediately and will be viewable on the next 2080 Event ID with all zero values. Some additional resources on the subject:

The PDC column should show 0 and not be used by Exchange

More information for Set-ExchangeServer is located here
More information about Event ID 2080 from MSExchangeDSAccess can be found here

Also check for Event ID 2070 or 2095. A 2070 event occurs when Exchange tries to query a DC and fails, and it cannot contact/communicate with a DC. If this event occurs during the time when clients have issues (or frequently), then it should be investigated. If you only see this event occasionally over several days, it could be a result of the DC being restarted during maintenance. The same is true with event 2095; infrequent isn't a concern but continued logging of this event could be a sign of a problem.

Always ensure MaxConcurrentApi bottlenecks are not present in the environment. To avoid this problem now or in future review the following information: You are intermittently prompted for credentials or experience time-outs when you connect to Authenti...

LDAP latencies can impact server and client performance. When you work with client connectivity issues, the LDAP counter can help point delays with communications to the DC’s. These are under the MSExchange ADAccess Domain Controllers(*) LDAP Read Time and LDAP Search Time and is recommended that the average be within 50ms and spikes no greater than 100ms. More information is located here.

Troubleshooting Logs and Tools

Some logging and tools to help troubleshoot issues are listed below.

Exchange Log Collector Script

A great option for collection of logs if you need to submit them to support teams or just review for yourself, is the “Exchange Log Collector Script”. This script allows you to collect a wide range of logs that is by default enabled on the server that is needed to investigate an issue. You can customize what logs the script needs to collect by using the switch parameters provided within the script. If you don’t know what to collect the switch –AllPossibleLogs is a good way to catch all the default logging. For more information review the Exchange Log Collector Script’s GitHub page.

HTTP Proxy RPCHTTP and MAPI Logs

In Exchange, there are several logs in the logging folder. For Outlook clients one of the first logs to examine are the HTTP Proxy logs on Exchange. The connection walk-through section shows the process that is used to connect to Exchange 2013. This complete process is logged in the HTTP Proxy log. Also, if it is possible, add Hosts file to the client for one specific Exchange to reduce the number of logs.

The RPCHTTP logs on Exchange are located here by default: C:\Program Files\Microsoft\Exchange Server\V15\Logging\HttpProxy\RpcHttp

The MAPI logs are located here by default: C:\Program Files\Microsoft\Exchange Server\V15\Logging\HttpProxy\Mapi

HTTP Proxy AutoDiscover Logs

Exchange has HTTP Proxy logs for AutoDiscover that are similar to the logs shown earlier that can be used to determine whether AutoDiscover is failing.

The logs on Exchange are located here by default: C:\Program Files\Microsoft\Exchange Server\V15\Logging\HttpProxy\AutoDiscover

HTTP Error Logs

HTTP Error logs are failures that occur with HTTP.SYS before hitting IIS. However, not all errors for connections to web sites and app pools are seen in the httperr log. For example, if ASP.NET threw the error it may not be logged in the HTTP Error log. By default, HTTP error logs are in C:\Windows\System32\LogFiles\HTTPERR. Information on the httperr log and codes can be found here.

IIS Logs

IIS logs can be used to review the connection for RPC/HTTP, MAPI/HTTP, EWS, OAB, and AutoDiscover. The full data for the MAPI/HTTP and RPC/HTTP is not always put in the IIS logs. Therefore, there is a possibility that the 200 connection successful may not be seen. This link is useful to identify the IIS codes.

In Exchange IIS logs should contain all user connections on port 443. IIS logs on the Mailbox server should only contain connections from the Exchange server on port 444.

Most HTTP connections are first sent anonymously which results in a 401-challenge response. This response includes the authentication types available in the response header. The client should then try to connect again by using one of these authentication methods. Therefore, a 401-status found inside an IIS log does not necessarily indicate an error.

Note that an anonymous request is expected to show a 401 response. You can identify anonymous requests because the domain\username is not listed in the request.

RPC Client Access (RCA) Logs

For Outlook Anywhere sessions, the RCA logs can be used to find when a user has made a connection to their mailbox, or a connection to an alternate mailbox, errors that occur with the connection, and more information. RCA logs are in the logging directory which is located at %ExchangeInstallPath%\Logging\RpcClientAccess. By default, these logs have a maximum size of 10MB and roll over when size limit is reached or at the end of the day (based on GMT), and the server keeps 1GB in the log directory.

Outlook ETL Logging (requires a support case with Microsoft to analyze the log)

ETL logs are in %temp%/Outlook Logging and are named Outlook-#####.ETL. The numbers are randomly generated by the system.

To enable Outlook logging

Open Outlook.
Click File, Options, Advanced.
Enable “Enable troubleshooting logging (requires restarting Outlook)”
Restart Outlook.

How to enable Outlook logging in the registry:

Browse to HKEY_CURRENT_USER\Software\Microsoft\Office\xx.0\Outlook\Options\Mail
DWORD: EnableLogging
Value: 1

Note: xx.0 is a placeholder for your version of Office. 16.0 = Office 2016, Office. 15.0 = Office 2013

ExPerfwiz (Perfmon for Exchange)

You can use Perfmon for issues that you suspect are caused by performance. See Perfwiz. Example command line: \experfwiz.ps1 -server MBXServer -interval 5 -filepath D:\Logs.

Exchange has daily performance logs that captures the majority of what is needed however, for most troubleshooting we will require Perfwiz. These logs are by default located in C:\Program Files\Microsoft\Exchange Server\V16\Logging\Diagnostics\DailyPerformanceLogs

Log Parser Studio

Log Parser Studio is a GUI for Log Parser 2.2. LPS greatly reduces complexity when parsing logs. Additionally, it can parse many kinds of logs including IIS Logs, HTTPErr Logs, Event Logs (both live and EVT/EVTX/CSV), all Exchange protocol logs, any text based logs, CSV logs and ExTRA traces that were converted to CSV logs. LPS can parse many GB of logs concurrently (we have tested with total log sizes of >60GB).

Blog post with tips and information on using Log Parser Studio.

Exmon tool (aka Microsoft Exchange Server User Monitor)

We use this tool to get detailed information about client traffic.

Thanks to David Paulson, Jim Martin, Nino Bilic, Bhalchandra Atre and Rob Whaley for technical review.

Charlene Weber

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs