Starting with the Skype for Business Cloud Connector (Cloud Connector) 1.4.1, we introduced an automated recovery process: Cloud Connector will try to automatically recover an appliance if the Cloud Connector management service detects a service is not running.
Auto Recovery Requirements
For details on Auto Update requirements, please see Understanding Cloud Connector Edition Auto Update
Auto Recovery Process
Detection: Process for detecting an appliance status runs every 60 seconds and status is updated in the online tenant and cached locally in “CCE Site Directory\Site_EdgeFQDN”.
Monitoring: The following services are actively monitored:
- Mediation Server: RTCSRV and MEDSVC
- Edge Server: RTCSRV
Recovery Process: If any monitored services are detected stopped:
- The appliance status set to Error, and its network connection disconnected.
- The AutoMaintenanceStatus value set to 3 indicating Auto Recovery status in the “Appliance Root\CceSevicePersistent” file.
- Service tries three (3) times to restart the failed service(s). The HAHealthKeepRecover value in the “C:\ProgramData\CloudConnector\ApplianceManager.ini” increments for each attempt.
- After 3 failed attempts, a process to restart each Cloud Connector Virtual Machine begins in this order: AD, CMS, MS, Edge.
- The network connection restored when the appliance is running.
- If recovery cannot complete, the following events logged by the Cloud Connector management service in the Windows application event log:
- Event ID 304: Completed recovery of the appliance, recover result: CannotRecover
- Event ID 20006: Appliance cannot be recovered automatically, please recover manually
To manually recover the appliance, first review the Cloud Connector management service log for details on what prevented automatic recovery from being successful.
Log location: “C:\Program Files\Skype for Business Cloud Connector Edition\ManagementService\CceManagementService.log”
If connectivity to the Cloud Connector virtual machines fails due to an access denied error message, manually check network connectivity to the machines.
If any services cannot be started on either the Edge and/or Mediation server, connect to the virtual machines and review the Lync Server event logs for relevant error messages and try to start the services manually.
Reset the Appliance
Once services are running, reset the appliance status to a running state by running the following cmdlets in an Administrative PowerShell on the Host Appliance:
- Enter-CcUpdate and wait for the process to complete.
- Exit-CcUpdate and wait for the process to complete.
Confirm the Appliance Status
Confirm the appliance has returned to a running state as follows:
- Open the “Appliance Root\CceSevicePersistent” file in Notepad and check confirm the value for AutoMaintenanceStatus is 0, and the IsInManualMaintenance value is false
- Connect to Remote PowerShell and run Get-CcHybridPSTNAppliance and confirm the Status for the appliance is Running.
Note: You can also check the status of appliances from the on premises PSTN tab in the Voice section of Skype for Business Admin Center in your Office 365 tenant portal.
Worst Case Scenario
If the current version of the appliance cannot be recovered, run Switch-CcVersion to switch to the backup version. After the backup version is confirmed running, uninstall the non-working version with: Uninstall-CcAppliance -Version “# of non-working version”.
Note that when the backup version is running, there will be no High Availability support due to inconsistent running and Cloud Connector script versions. Update to the current version as soon as possible, either by modifying the auto update schedule, or manually. For manual update instructions see Upgrade a single site to a new version in the Cloud Connector Edition configuration guide.
Cmdlets to check versions
- Installed Cloud Connector script version: Get-CcVersion
- Appliance running version: Get-CcRunningVersion