SCOM Management Server grayed out with event description "A module of type "System.DataSubscriber" reported an error 0x80FF0003"

Posts in this blog are provided "AS IS" with no warranties, and confers no rights. Use of included script samples are subject to the terms specified in the Terms of UseAre you interested in having a dedicated engineer that will be your Microsoft representative.

 

Let me start with something generic. My Management Server is in a grayed out state and what I will do next.

I will start with running the below SQL query in the Operations Manager Database.

--Replace the name SCOMMS with the name of your Management Server
select BME.Path,AV.ReasonCode,AV.TimeStarted,AV.TimeFinished from AvailabilityHistory AV
join BaseManagedEntity BME on AV.BaseManagedEntityId=BME.BaseManagedEntityId
where BME.FullName like '%SCOMMS%'
order by AV.TimeStarted desc

Here in the output from my LAB.

The reason code description are given below

17 The Health Service windows service is paused.
25 The Health Service Action Account is misconfigured or has invalid credentials.
41 The Health Service failed to parse the new configuration.
42 The Health Service failed to load the new configuration.
43 A System Rule failed to load.
49 Collection of Object State Change Events is stalled.
50 Collection of Monitor State Change Events is stalled.
51 Collection of Alerts is stalled.
97 The Health Service is unable to register with the Event Log Service. The Health Service cannot log additional Heartbeat and Connector events.
98 The Health Service is unable to parse configuration XML.

 

In our case, the Reason Code is 43 which says "A System Rule failed to load".

If you will look at the eventvwr on the Management Server you will see these events.

These events will definitely tell you that that some rules are unloaded. However, in this case it has not really give us an idea  about the problem. I have worked in many cases where it right way gives the rule name and the issue. In our case, the rule name is a Data Warehouse collection rule, so I did not find it a need to check it at this point of time.

I looked through the eventvwr and found another interesting event.

I check the status of the server SQL2016 in my console and find that the server has an entry in both Agent Managed and Agentless. The only way which I can think of coming to such a scenario is to install it as agentless managed and then install it manually and approve it from the pending management.

And since it is not supported/recommended to add the same server under agentless and agent managed at the same time, we ended up in such a situation.

I delete the entry from agentless managed and everything is back normal and healthy.

So in order to avoid such a situation, please make sure you do not have the option "Automatically approve new manually installed agents" selected in SCOM console. And if you have lot of agentless managed computers, do a check before approving them from pending management. You can use the below PowerShell cmdlet to do a quick check.

Get-SCOMAgentlessManagedComputer | select computername
Get-SCOMAgentlessManagedComputer | where {$_.computername -eq 'SQL2016'} | select computername