How To Troubleshoot Microsoft Exchange Server Latency or Connection Issues

Written by Samuel Drey, Premier Field Engineer. TSolving a Puzzlehis article is meant to be a hopefully useful guide to help Microsoft Exchange Server IT Operations teams understand, troubleshoot and remedy situations where users are experiencing issues connecting to the Exchange messaging service via Outlook or OWA. I’ve included information relating to Exchange Server 2003, 2007 and 2010.  The following process helps rule out server latencies and helps determine whether a less than optimal messaging user experience comes from a client-side configuration, client-side performance issue or a server-side issue.

Step 1: Check the Application Log and System Log

The first thing to look at is the Application Log and then the System Log for possible errors. Usually, poor messaging experiences caused due to server issues are surfaced by warnings or errors regarding memory or disk issues and are obvious recurring events.  For example: Error 9582 stating that “the virtual memory necessary to run your Exchange server is fragmented in such a way that performance may be affected” or Event ID 51 for the disk component stating that “an error was detected on device \Device\Harddisk3\DR3”.

Step 2: Check for Issues Using Key Performance Counters

The second less obvious thing to look at are the performance counters and checking if there are any latencies. The first counters that will indicate performance issues are the RPC latencies counters since all the actions a user does corresponds to RPC requests being sent to the Exchange server.

Here are the steps to follow:

  • Check for RPC latencies
  • Check for CPU performance issues
  • Check for Memory load issues
  • Check for Disk bound issues
  • Check for Network issues
  • Check for Active Directory related issues
  • Check for Virus scanning issues

If an issue is not visible in the Application or System Log, then the performance logging analysis will point out the cause(s) of the issue most of the time, provided you use the correct methodology as introduced above.

Conduct Performance Analysis to Fine-Tune Exchange Components  and Help Identify Issues

  • If users are still able to connect to the Exchange server, but they encounter huge latencies, then performance analysis with will tell you where the issue is.
  • As there are hundreds of counters on an Exchange Server it is essential to have a subset of counters to begin with the performance analysis.

Once the component causing the Exchange issue (e.g. Disk, Memory, Network, etc…) has been identified, then we can dig further in the analysis of this component by using more of the component’s counters.

  • For example, with the Memory component, we must check the “Available MB” and “Pages/Sec” counters, and if one of these shows an issue, then we will add more Memory counters (total counters for the Memory component is 35). That’s why we start with only 2 counters, Available MB and Pages/Sec. The principle is the same for all other components: take 2 to 4 significant counters, then dig further.

Key Exchange Performance Counters for Monitoring and Troubleshooting

Below are two tables that I created in the past for initial versions of Microsoft’s Premier Exchange Risk Assessment Programs and that I updated since then to fit the evolution of best practices:

  • Exchange 2007/2010 counters table
  • Exchange 2003 counters table

We really encourage administrators to focus on these specific counters to effectively monitor their Exchange infrastructure and to proactively identify potential performance issues. Usually these are a subset of the System Center Operations Manager (SCOM) Exchange Management Pack rules, so use tables below to tune SCOM alerts to focus on the most important ones. If you are using another monitoring application, integrate the above counters into your monitoring solution.

Exchange Server 2007/2010 Key Performance Counters

Here is the selection of the key Exchange 2007/2010 counters that will help point out where the issue is (you can copy/paste the relevant counter names):

For additional information, check out Monitoring Without System Center Operations Manager.

SERVER ROLE

COUNTER

Check

Expected

Database and Database ==> Instances

MAILBOX AND HUB

MSExchange Database(Information Store)\Database Page Fault Stalls/sec

Avg

<10

Max

<100

MAILBOX

MSExchange Database ==> Instances(*)\Log Generation Checkpoint Depth

Max

<=500

MAILBOX

xchange Database(Information Store)\Version buckets allocated

Max

<=12000

HUB

MSExchange Database ==> Instances(edgetransport/Transport Mail Database)\Log Generation Checkpoint Depth

Max

<=1000

HUB

MSExchange Database ==> Instances(edgetransport/Transport Mail Database)\Version buckets allocated

MAX

<=200

LogicalDisk (or substitute PhysicalDisk if Logical is unavailable)

MAILBOX

LogicalDisk – Temp/Page File Disks

LogicalDisk\Average Disk sec/Read

Avg

<10ms

Max

<=50ms

LogicalDisk\Average Disk sec/Write

Avg

<10ms

Max

<=50ms

HUB

LogicalDisk – SMTP

LogicalDisk\Average Disk sec/Read

Avg

<20ms

Max

<=50ms

LogicalDisk\Average Disk sec/Write

Avg

<20ms

Max

<=50ms

MAILBOX

LogicalDisk – Databases

LogicalDisk\Average Disk sec/Read

Avg

<=20ms

LogicalDisk\Average Disk sec/Write

Avg

<=100ms

LogicalDisk – Transaction Logs

LogicalDisk\Average Disk sec/Read

Avg

<=20ms

LogicalDisk\Average Disk sec/Write

Avg

<=10ms

Logical Disk - All disks

CAS

LogicalDisk(_Total)\Disk Reads/sec

Max

<=50

LogicalDisk(_Total)\Disk Writes/sec

Max

<=50

Memory

COMMON

Memory\Available Mbytes (MB)

Min

>=100Mb

Memory\Pages/sec

Max

<1,000

MSExchangeDSAccess

COMMON

MSExchange ADAccess Domain Controllers(*)\LDAP Read Time

Avg

<=50ms

Max

<=100ms

MSExchange ADAccess Domain Controllers(*)\LDAP Search Time

Avg

<=50ms

Max

<=100ms

MSExchange ADAccess Domain Controllers(*)\LDAP Searches timed out per minute

Max

<=10

MSExchange ADAccess Domain Controllers(*)\Long running LDAP operations/Min

Max

<=50

MSExchangeIS

MAILBOX

MSExchangeIS Public(_Total)\Replication Receive Queue Size

Max

<=100

MSExchangeIS\RPC Averaged Latency

Avg

<=25ms

MSExchangeIS\RPC Num. of Slow Packets

Avg

<=1

Max

<=3

MSExchangeIS\RPC Operations/sec

Avg

info only

Min

Max

MSExchangeIS\RPC Packets/sec

Avg

info only

Min

Max

MSExchangeIS\RPC Requests

Max

<70

MSExchangeIS\Virus Scan Queue Length

Max

<=10

MSExchangeIS\VM Largest Block Size

Min

info

MSExchangeIS\VM Total 16MB Free Blocks

Min

info

MSExchangeIS\VM Total Free Blocks

Min

info

MSExchangeIS\VM Total Large Free Block Bytes

Min

info

Network Interface

COMMON

Network Interface\Bytes Total/sec

Max

<=7MBps or <=70MBPS

Network Interface\Current Bandwidth

special

Network Interface\Packets Outbound Errors

Max

=0

Process, Processor, and System

COMMON

Processor(_Total)\% Processor Time

Avg

<=75%

Processor(_Total)\% User Time

Avg

<=75%

Processor(_Total)\% Privileged Time

Avg

<=75%

Process(*)\% Processor Time

special

System\Processor Queue Length (all instances)

Avg

<=5 per proc

SMTP Server

HUB

\MSExchangeTransport Queues(_total)\Aggregate Delivery Queue Length (All Queues)

Avg

<=3000

Max

<=5000

\MSExchangeTransport Queues(_total)\Active Remote Delivery Queue Length

Max

<=250

\MSExchangeTransport Queues(_total)\Active Mailbox Delivery Queue Length

Max

<=250

\MSExchangeTransport Queues(_total)\Submission Queue Length

Max

<=100

\MSExchangeTransport Queues(_total)\Active Non-Smtp Delivery Queue Length

Max

<=250

\MSExchangeTransport Queues(_total)\Retry Mailbox Delivery Queue Length

Max

<=100

\MSExchangeTransport Queues(_total)\Retry Non-Smtp Delivery Queue Length

Max

<=100

\MSExchangeTransport Queues(_total)\Retry Remote Delivery Queue Length

Max

<=100

\MSExchangeTransport Queues(_total)\Unreachable Queue Length

Max

<=100

CAS Server

CAS

Outlook Web Access Counters

MSExchange OWA\Average Response Time

Max

<=100ms

MSExchange OWA\Average Search Time

Max

<=31000ms

CAS to MBX connection

RPC/HTTP Proxy\Number of Failed Back-End Connection attempts per Second

Max

=0

Client Access Server OAB Download Counters

MSExchangeFDS:OAB(*)\Download Task Queued

Max

=0

 

Exchange Server 2003 Key Counters

Here is the selection of the key Exchange 2003 counters that will help point out where the issue is (you can copy/paste the relevant counter names):

Exchange server 2003 counters

COUNTER

Check

Expected

Links for more information

Database and Database ==> Instances

Database ==> Instances(*)\Log Record Stalls/sec

Avg

<10

More info…

Max

<100

LogicalDisk (or substitute PhysicalDisk if Logical is unavailable)

LogicalDisk – Temp/Page File Disks

LogicalDisk\Average Disk sec/Read

Avg

<10ms

More info…

Max

<=50ms

More info...

LogicalDisk\Average Disk sec/Write

Avg

<10ms

More info...

Max

<=50ms

More info...

Paging File\% Usage

Avg

<50%

More info...

LogicalDisk – SMTP

LogicalDisk\Average Disk sec/Read

Avg

<10ms

More info...

Max

<=50ms

More info...

LogicalDisk\Average Disk sec/Write

Avg

<10ms

More info...

Max

<=50ms

More info...

LogicalDisk – Database

LogicalDisk\Average Disk sec/Read

Avg

<20ms

More info...

Max

<=50ms

More info...

LogicalDisk\Average Disk sec/Write

Avg

<20ms

More info...

Max

<=50ms

More info...

LogicalDisk – Database (additionnal disk 1)

LogicalDisk\Average Disk sec/Read

Avg

<20ms

More info...

Max

<=50ms

More info...

LogicalDisk\Average Disk sec/Write

Avg

<20ms

More info...

Max

<=50ms

More info...

LogicalDisk – Transaction Logs

LogicalDisk\Average Disk sec/Read

Avg

<5ms

More info...

Max

<=50ms

More info...

LogicalDisk\Average Disk sec/Write

Avg

<10ms

More info...

Max

<=50ms

More info...

LogicalDisk – Transaction Logs (additionnal disk 1)

LogicalDisk\Average Disk sec/Read

Avg

<5ms

More info...

Max

<=50ms

More info...

LogicalDisk\Average Disk sec/Write

Avg

<10ms

More info...

Max

<=50ms

More info...

Memory

Memory\Available Mbytes (MB)

Min

>=50

More info…

Memory\Free System Page Table Entries

Min

>=5000

More info...

Memory\Pages/sec

Max

<1,000

More info...

MSExchangeDSAccess

MSExchangeDSAccess Process\LDAP Read Time (for all processes)

Avg

<50ms

More info…

Max

<=100ms

MSExchangeDSAccess Process\LDAP Search Time (for all processes)

Avg

<50ms

More info...

Max

<=100ms

MSExchangeIS

MSExchangeIS Public\Replication Receive Queue Size

Max

<=1000

More info…

MSExchangeIS\RPC Averaged Latency

Max

<=50ms or 100ms

More info…

Avg

MSExchangeIS\RPC Operations/sec

Avg

info only

More info…

Min

Max

MSExchangeIS\RPC Packets/sec

Avg

info only

Min

Max

MSExchangeIS\RPC Requests

Max

<30

More info…

MSExchangeIS\Virus Scan Queue Length

Max

<=10

More info…

MSExchangeIS\VM Largest Block Size

Min

>32Mb

More info…

MSExchangeIS\VM Total 16MB Free Blocks

Min

>=1

More info...

MSExchangeIS\VM Total Free Blocks

Min

>=1

More info...

MSExchangeIS\VM Total Large Free Block Bytes

Min

>50MB

More info...

Network Interface

Network Interface\Bytes Total/sec

Max

<7MBps or <70MBps

More info…

Network Interface\Current Bandwidth

special

Network Interface\Packets Outbound Errors

Max

=0

More info...

Process, Processor, and System

Processor\% Processor Time (_Total)

Avg

<80%

More info…

Processor\% Privileged Time (_Total)

Avg

special

More info...

Process(*)\% Processor Time (_Total)

special

More info…

Process(*)\% Privileged Time (_Total)

special

More info...

Process(*)\Virtual Bytes (store)

Max

<2.8GB

More info…

System\Processor Queue Length

Avg

<2

More info…

SMTP Server

SMTP Server\Categorizer Queue Length

Max

<10

More info…

Avg

SMTP Server\Local Queue Length

Max

<1000

More info...

SMTP Server\Remote Queue Length

Max

<1000

More info...

Avg

info

Hope you found this helpful.  At a later date, I will provide the equivalent procedure to help you troubleshoot client-side latencies.