Exchange 2013 Managed Availability HealthSet Troubleshooting

Knowing how to deal with annoying HealthSet's was scary at the beginning of my experience with Exchange 2013.

The new introduced feature called Managed Availability, is a built-in monitoring system that can take recovery actions that can cause serious issues trying to solve a small issue.

There are 3 types of components that can be related to HealthSet's

  • Probe : used to determine if Exchange components are active
  • Monitor : when probes signal a different state then the one stored in the patters of the monitoring engine, monitoring engine will determine whether a component or feature is unhealthy.
  • Responder : will take action if a monitor will alert the responder about an unhealthy state. Responders will take different actions depending on the type of component or feature. Actions can start with just recycling the application pool and can go to as far as restarting the server or even worse putting the server offline so it won't accept any connections.

In this Blog post I will talk about troubleshooting different HealthSet's in Exchange 2013.

Troubleshooting Exchange HealthSet MailboxSpace

We should start with getting a health report for the Exchange 2013 server by using the Get-HealthReport cmdlet

Get-HealthReport -Identity EXCH2K13

Image 1

If you want to list only those HealthSet's that are Unhealthy, Degraded, Disabled you can use this cmdlet :

Get-HealthReport -Server EXCH2K13| where { $_.alertvalue -ne "Healthy" }

Let's list a couple of these components for the MailboxSpace HealthSet

Get-MonitoringItemIdentity -Identity MailboxSpace -Server EXCH2K13 | ft Identity,ItemType,TargetResource -autosize

Image 2

As you can see the HealthSet has multiple Probes, Monitors, Responders.

What if you have a HealthSet with status Unhealthy or Repairing like the MailboxSpace for a Test DB ?

We need to investigate further to check what are the monitors that are causing the HealthSet to go into Unhealthy state.

Get-ServerHealth -Identity EXCH2K13 -HealthSet "MailboxSpace"

Image 3

As you can see above, a lot of Monitors are Unhealthy.

Assuming that in your production environment you have a TEST DB located on C: drive that you probably don't want to move or delete but because of limited space available on C you are getting these Unhealthy monitors.

We can use Add-ServerMonitoringOverride to disable these monitors.

Add-ServerMonitoringOverride

https://technet.microsoft.com/en-us/library/jj218628(v=exchg.150).aspx

The limitations for this is the 60 days limit for a server override

Add-ServerMonitoringOverride -Duration 60.00:00:00 -Identity ProbeMonitorResponderName -ItemType Monitor -PropertyName Enabled -PropertyValue 0

Using the result we got in Image2 with Get-MonitoringItemIdentity and combining that with Get-ServerHealth we will identify Monitors that need to be overridden.

We have the following Monitors with Unhealthy or Repairing state :

MailboxSpace\DatabaseLogicalPhysicalSizeRatioEscalationNotification\DB01

MailboxSpace\DatabaseLogicalPhysicalSizeRatioEscalationNotification\DB02

MailboxSpace\DatabaseLogicalPhysicalSizeRatioEscalationProcessingMonitor

MailboxSpace\DatabaseSizeMonitor\DB01

MailboxSpace\DatabaseSizeMonitor\DB02

MailboxSpace\Stora-PrgeLogicalDriveSpaceMonitor\C:

Add-ServerMonitoringOverride -ItemType Monitor -Identity "MailboxSpace\DatabaseLogicalPhysicalSizeRatioEscalationNotification\DB01" -PropertyValue 0 -PropertyName Enabled -Duration "60.00:00:00" -Server EXCH2K13

Add a server override for all the Monitors above, please make sure of the ItemType if it's Probe, Monitor, Responder

At the end you can verify your Server Overrides with Get-ServerMonitoringOverride

Image 4

Now we should check ServerHealth to see if the Monitors have been disabled

Get-ServerHealth -Identity EXCH2K13 -HealthSet "MailboxSpace" | ft -Autosize

Image 5

MailboxSpace HealthSet is Healthy now.

Image 6

Troubleshooting FEP HealthSet

Some of you don't have ForeFront installed so you would want to disable this HealthSet on the server.

We will achieve this simply by changing the xml file that corresponds to FEP Health set

Browse to C:\Program Files\Microsoft\Exchange\V15\Bin\Monitoring\

Search for FEPActiveMonitoringContext. Open the file with Notepad

Change Line 12 : Enabled = “True”

Replace TRUE with FALSE to disable FEP monitoring.

The file should look something like this :

<?xml version="1.0" encoding="iso-8859-1"?>
< Definition xsi:noNamespaceSchemaLocation="..\..\WorkItemDefinition.xsd" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance">
  <!--FEPService Maintenance definition section-->
  <MaintenanceDefinition
    AssemblyPath="Microsoft.Exchange.Monitoring.ActiveMonitoring.Local.Components.dll"
    TypeName="Microsoft.Exchange.Monitoring.ActiveMonitoring.FEP.FEPDiscovery"
    Name="FEP.Maintenance.Workitem"
    ServiceName="FEP"
    RecurrenceIntervalSeconds="0"
                  TimeoutSeconds="30"
                  MaxRetryAttempts="0"

                  Enabled = "false">

After you modify the above line you should restart Microsoft Exchange Health Management service on the server where you modified the xml file

Troubleshooting CAS Proxy HealthSet's

What if you have TMG in your Organization and you need to set OWA/ECP with Basic Authentication

You will probably disable Forms Authentication on OWA and ECP

Soon after you have disabled forms Authentication you will start seeing that some server components will go in inactive state like OWA.Proxy, ECP.Proxy , RWS.Proxy

You can check with : Get-ServerComponentState -Identity EXCH2K13

Image 7

We can set the component back to Active manually by running this cmdlet :

Set-ServerComponentState -Identity EXCH2K13 -Component EcpProxy -State Active -Requester HealthAPI

After 1 hour the components will return to an Inactive state.

If we continue forward with troubleshooting and check Crimson Logs on your server you will find events related to ECP.Proxy Probe.

More information about Crimson channel event logging can be found here

https://technet.microsoft.com/en-us/library/dd351258(v=exchg.150).aspx#Crimson

 

Event Viewer > Application and Services Logs > Microsoft > Exchange > ActiveMonitoring > ProbeResult

Find the event related to Probe Result (Name=ECPProxyTestPRobe/MSExchangeECPAppPool) select Details and at StateAttribute3 you will see

"FailurePoint=FrontEnd,HttpStatusCode=401,Error=Unauthorized,Details=,HttpProxySubErrorCode=,WebExceptionStatus=,LiveIdAuthResult="

 

ECP.Proxy Probe is failling with 401 Unauthorized error, credential used can be seen at StateAtrribute2

Verify HealthSet for ECP and OWA

Get-HealthReport -Server EXCH2K13

You will see the ECP,OWA,ECP.Proxy,OWA.Proxy,RWS Proxy HealthSet's are Unhealthy

To remove this behavior we can disable the Monitoring Probes for OWA, ECP , RWS

Open Windows Explorer and browse to :

C:\Program Files\Microsoft\Exchange Server\V15\Bin\Monitoring\Config\

Open ClientAccessProxyTest.xml with Notepad

Change the "true" value of the following Monitoring Probes

ECPProbeEnabled = "false"

OWAProbeEnabled = "false"

ReportingProbeEnabled = "false"

Save the ClientAccessProxyTest.xml and close it

Restart Microsoft Exchange Health Manager on the server where you modified xml file

Disabling the Monitoring Probes has no impact on the Exchange Servers Proxy functionality.

If you want to modify any other settings to the xml files locate in Bin\Monitoring\Config\ please consult a Microsoft Exchange Support Engineer before doing any modifications to those files.

To conclude this the problem is with the Authentication method used on the IIS sites ECP, OWA.

Monitoring Probes can only use Forms Based Authentication and Windows Authentication to test ECP , OWA , RWS functionality.

 

I hope the information provided was helpful for you.

 

If you have any questions please feel free to send an email to a-crtimo@microsoft.com