Ensuring that your servers are operating reliably and that your mailbox database copies are healthy are primary objectives of daily Exchange 2010 messaging operations. Of course you must actively monitor the hardware, the Windows operating system, and the Exchange 2010 services. But when running in an Exchange 2010 mailbox resiliency environment, it is important that you monitor the health and status of the database availability group (DAG) and your mailbox database copies. It is especially vital to perform data redundancy risk management and monitor for periods in which a replicated database is down to just a single copy. This is particularly critical in environments that do not use RAID and instead deploy Just a Bunch Of Disks (JBOD). In a RAID environment, a single disk failure does not affect an active mailbox database copy. However, in a JBOD environment, a single disk failure will trigger a database failover. It is therefore a top priority for administrators to know when they are down to a single healthy copy of a database.
Note It's important to understand how we count copies. When you create a new database, but before you run Add-MailboxDatabaseCopy, you have one copy of the database. When you run Add-MailboxDatabaseCopy for the first time, you are creating your second database copy.
Exchange 2010 includes several built-in tools and features that should be used as part of regular proactive monitoring of a highly available Exchange environment, such as the Get-MailboxDatabaseCopyStatus and Test-ReplicationHealth cmdlets, and the CollectOverMetrics.ps1 and CollectReplicationMetrics.ps1 scripts.
Today, we are releasing an additional PowerShell script called CheckDatabaseRedundancy.ps1. As its name implies, the purpose of the script is to monitor the redundancy of replicated mailbox databases by validating that there is at least two configured and healthy and current copies, and to alert you when only a single healthy copy of a replicated database exists. In this case, both active and passive copies are counted when determining redundancy.
When executing the script, you must specify either a database name or a DAG member name. To specify a database, you use the MailboxDatabaseName parameter and to specify a DAG member, you use the MailboxServerName parameter. When run interactively in the console, the script performs the redundancy check only once, and outputs the CurrentState (red or green) on the screen:
[PS] CheckDatabaseRedundancy.ps1 -MailboxDatabaseName "Mailbox Database 1928496050"
DatabaseName : Mailbox Database 1928496050
LastRedundancyCount : 0
CurrentRedundancyCount : 2
LastState : Unknown
CurrentState : Green
LastStateTransitionUtc : 5/11/2010 7:51:19 PM
LastGreenTransitionUtc : 5/11/2010 7:51:19 PM
LastGreenReportedUtc : 5/11/2010 7:51:19 PM
PreviousTotalRedDuration : 00:00:00
TotalRedDuration : 00:00:00
IsTransitioningState : True
HasErrorsInHistory : False
Like other scripts and cmdlets, CheckDatabaseRedundancy.ps1 can also be run in monitoring mode and generate events by adding the MonitoringContext parameter. This enables the script to be invoked by a monitoring solution, such as Microsoft System Center Operations Manager (SCOM). In monitoring mode, the script logs red alert and green alert events into the local server's Application event log. A red alert event (event ID 4113) is fired only if the database has been "red" for 20 minutes more (in duration, not consecutive) in the hour-long run of the script, and a green alert event (event ID 4114) when the database has been "green" for 10 consecutive minutes. By default, once a red alert event is generated, it will continue to be reported every 15 minutes.
Below is an example of a red alert event (click to enlarge):
Below is an example of a green alert event (click to enlarge):
Note These events will not appear as shown above until the event resource binary file containing the updated strings for this event is installed on the system. This binary file (clusmsg.dll), which be updated with the first update rollup that includes the CheckDatabaseRedundancy.ps1 string (most likely update rollup 4). Until then, the description of the event will read as follows: "The description for Event ID 4114 from source MSExchangeRepl cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer." The lack of these strings in the event will not affect monitoring, as event 4113 always indicates a red alert (and it will contain the name of the database and errors that caused that database to be down to a single copy), and event 4114 will always indicate a green alert.
In addition, the script has some other useful options. For example, you can add the ShowDetailedErrors parameter to get greater detail about any errors that occur, and you can add the Verbose parameter for additional troubleshooting information. The script also includes a SendSummaryMailTos parameter which can be used to send a summary report by email to a list of specified email addresses when the script has finished running. This enables administrators to quickly look at hourly reports to see if any redundancy issues have occurred. If you do use the email functionality, you'll need to include the SummaryMailFrom parameter whenever you use the SendSummaryMailTos parameter.
We recommend running this script regularly, as part of your normal monitoring operations. To ensure you don't have lengthy periods in which database redundancy is compromised, run the script every 60 minutes. The script includes a parameter called TerminateAfterDurationSecs, which when set to -1 or 0 when executing the script, can be used to run the script for an infinite amount of time. If you're not running a monitoring solution such as SCOM, you can create a Windows scheduled task to do automate and schedule script execution. However, be aware that there are known issues in the Windows 2008 SP2 Task Scheduler that may cause Task Scheduler to crash when you have scheduled a long-running task. These issues do not exist in Windows Server 2008 R2; so if possible, run the script from Windows Server 2008 R2.
If you can't run the script from Windows Server 2008 R2, and you're running it from Windows Server 2008 SP2, we recommend two modifications. First, instead of running the script with its built-in transient suppression of 60 minutes, run the script every 5 minutes by using the following parameters:
CheckDatabaseRedundancy.ps1 -MonitoringContext -SleepDurationBetweenIterationsSecs:0 -TerminateAfterDurationSecs:1 -SuppressGreenEventForSecs:0 -ReportRedEventAfterDurationSecs:0 -ReportRedEventIntervalSecs:0 -ShowDetailedErrors
Second, if possible, use SCOM to define the transient suppression behavior (e.g., if 3 red alert events are logged within a 20 minute period, generate an alert; and if a green alert event is logged, change the CurrentState to Green).
Here are the steps you can use to schedule this script:
- Copy the script to the Exchange server or management workstation from which you want to run it. Do not copy this into the
\Scripts folder. Instead, choose a unique location for the script (for example, C:\Operations).
- Configure a scheduled task through the Windows Task Scheduler by running the following command:
schtasks /create /TN "Check Database Redundancy" /TR "Powershell.exe -NonInteractive -WindowStyle Hidden -command 'C:\Program Files\Microsoft\Exchange Server\V14\bin\RemoteExchange.ps1'; Connect-ExchangeServer -auto; C:\Operations\CheckDatabaseRedundancy.ps1 -MonitoringContext -ShowDetailedErrors -SummaryMailFrom:'SMTPFromAddress@contoso.com' -SendSummaryMailTos:@('SMTPToAddress@contoso.com') -ErrorAction:Continue" /RU SYSTEM /SC HOURLY
Replace the parameters in the above script with the script parameters you want to use. Additional parameters for the script are also described in the script.
When using the schtasks command line tool to create a scheduled task, the /TR option is limited to 261 characters, which is easy to exceed when using multiple script parameters. The above example exceeds that limit. If the parameters and paths you use cause the /TR option to exceed 261 characters then you must manually create the scheduled task using the Task Scheduler applet on the Administrative Tools menu. Alternatively, you can download this XML file, edit it appropriately, save it, and import it using the Task Scheduler applet.
We're releasing this script to you now because we think it is very important that all customers monitor for situations in which database redundancy is compromised and immediately take action to restore database redundancy and avoid catastrophic data loss. Eventually, a version of the script will be released in a forthcoming update rollup for Exchange 2010 (most likely Update Rollup 4), and after that it is expected to ship in Service Pack 1. Note that when it does ship with SP1, the Release Notes may include updated information for scheduling the script to run regularly on your servers.
We hope you find this useful, and welcome your feedback. You can download the script here.