Recently I had the (un)fortunate experience of troubleshooting an issue with Exchange 2010 DAG Database copies failing over to another servers. This happened in several different environments that I was supporting, so I know it can happen to anyone. Here is a short synopsis of this issue (not all symptoms are listed):
- SCOM Alert: A significant portion of the database buffer cache has been written out to the system paging file. This may result in severe performance degradation.
- SCOM Alert: Hard I/O error will dismount or terminate replication on a database copy.
- Event Log contained the following: The database could not allocate memory. Please close some applications to make sure you have enough memory for Exchange Server. The exception is Microsoft.Exchange.Isam.IsamOutOfMemoryException: Out of Memory (-1011)
- Per perfmon, the disk subsystem itself was not having performance issues. However the server was clearly not able to sustain proper performance, leading us to believe that there was not enough memory within the system:
- System cached was constantly paging and repurposing pages
- Available RAM on the server was consistently under 300MB, even though the database cache (store) was only consuming 60% of the total RAM.
- MSExchange Database(Information Store)\Database Cache % Hit would fluctuate between 90-95%, even though most clients were in cached mode
- Process Explorer showed a number of processes with high memory consumption. Upon further review, identified known memory issues with some of these processes.
- Disabling applications and services (like antivirus, backup, monitoring, etc) did not significantly free up the consumed memory
- Low memory issues would occur when importing content into mailboxes. This may be from the additional work load required by Exchange aware antivirus (background scan) and content indexing for this new content.
- Antivirus exclusions were not properly configured against the SCOM Monitoring services and File Share Witness directory
Step 1: Confirmed that all servers within the DAG were consistently and properly configured, specifically network configuration and antivirus exclusions
Step 2: Installed latest drivers, firmware, and recommended hotfixes (2 fixes resolved memory leak issues)
Step 3: Added additional RAM in the servers (amount may vary on environment/need)
Step 4: Reboot the server
NOTE: Prior to modifying the RAM of an Exchange 2010 server, understand how that will directly impact database cache. Review Understanding the Mailbox Database Cache. Also understand that other factors may need to be adjusted (ex: paging file config).
Some may ask, “didn’t you follow the RAM guidance within the mailbox storage calculator?” Yes we did but there were several factors that changed after we completed that phase of the design, including mailbox configuration, additional processes running on the servers, and user profiles/load.