There is a new feature added to the R2 Cumulative Update 4 hotfix, which I recently wrote about HERE.
This new feature enables the OpsMgr Management Server to tolerate a SQL outage hosting the OperationsManager database better in specific cases. This feature is NOT enabled by default, by design. In order to enable this feature you MUST have previously applied R2-CU4 or later to your RMS role. You should only enable this feature, if you feel you have been impacted by this issue, and you find you have to restart your RMS services frequently to get things flowing again after a SQL connectivity outage.
Under typical situations, the Root Management Server reconnects to SQL pretty well, if the SQL server is unavailable for a short time. This might happen if your SQL cluster is failed over (there is a short period where the SQL instance is unavailable during a failover) or when patching/rebooting a stand-alone (non-clustered) SQL server.
However – in larger environments, or when the SQL outage is extended beyond a short reboot/failover, we have seen where the RMS does not reconnect/recover successfully. Subsequently, the RMS might start logging errors in the event log from the Health Service – including 2115 (bind) events, and 4506 (Data dropped) events. Previously – this situation did not recover until the RMS OpsMgr services were restarted (and in some cases the HealthService on the Management server).
To enable this feature – On the RMS – create two new registry entries:
Under the “HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\DAL” key, create two new DWORD values, as below:
DALInitiateClearPool should be set to Decimal value “1” to enable it.
DALInitiateClearPoolSeconds should be set to Decimal value “60” to represent 60 second retry interval.
Here is a screenshot:
This change will take effect after you restart the RMS HealthService (System Center Management Service).