Windows Automatic Services Monitoring using SCOM


Monitoring services in windows computers is available out of box in SCOM through Service Monitoring Template. But in a large enterprise with over 1000s of windows computers and 100s of applications, it is difficult to list out all services that needs to be monitored in each computer and create monitoring using template. Consider monitoring on average 30 services in 1000 computers would result on 30,000 instances added to SCOM DB. This will create numerous classes, discoveries and cause bloating of instance space which will make SCOM less responsive.

Also, we cannot create a monitor for each service and target it across all computers as each service may be present on bunch of computers and not on others. Thus targeting unanimously will result in false alarms and again, we may need 30+ windows service monitors targeted to all windows computers which will create overhead on agents and thus on the computers running the agent.

So, What is the solution?

Optimal solution would be creating a single rule to monitor all automatic services in each computer and alert on those which are not running. This can be accomplished using Powershell script with property bag output.

The rule runs on each computer at specific time interval, creates property bags for each service which is set to automatic but not running and an alert is generated for each property bag.

A catch to note in this monitoring scenario is not to alert on services that are stopped only for a moment. To overcome the issue, we will use consolidator condition. So only if the service is failed for ‘n’ consecutive samples, we will alert.

This solution, though optimal pose another challenge – What if we do not want to monitor a service which is set to automatic in one or few of computers.

This can be handled using a centrally located file with details of service and the computers to be excluded from monitoring.

We will see how to construct the Management Pack XML to accomplish this. You can also create MP using Visual Studio, MP Studio or Authoring Console.

Step 1:

Add references to the Management pack.

1 <ManagementPack ContentReadable="true" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 2 <Manifest> 3 <Identity> 4 <ID>GKLab.Windows.Automatic.Service.Monitoring</ID> 5 <Version>1.0.0.0</Version> 6 </Identity> 7 <Name>GKLab Windows Automatic Service Monitoring</Name> 8 <References> 9 <Reference Alias="SC"> 10 <ID>Microsoft.SystemCenter.Library</ID> 11 <Version>6.1.7221.0</Version> 12 <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> 13 </Reference> 14 <Reference Alias="Windows"> 15 <ID>Microsoft.Windows.Library</ID> 16 <Version>6.1.7221.0</Version> 17 <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> 18 </Reference> 19 <Reference Alias="Health"> 20 <ID>System.Health.Library</ID> 21 <Version>6.1.7221.0</Version> 22 <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> 23 </Reference> 24 <Reference Alias="System"> 25 <ID>System.Library</ID> 26 <Version>6.1.7221.0</Version> 27 <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> 28 </Reference> 29 <Reference Alias="Performance"> 30 <ID>System.Performance.Library</ID> 31 <Version>6.1.7221.0</Version> 32 <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> 33 </Reference> 34 </References> 35 </Manifest>

Step 2:

Now create a Powershell property bag probe script. The Powershell script fetches list for all services that are set to start automatic and checks for the current status. For each service that are set to Automatic but not running, a property bag is created.

To exclude some services from being monitored, a centrally located CSV file is used and the path of file is passed as parameter to the script. The script reads list of services to be excluded from monitoring from CSV file and compares it with the list of services in the target computer. The property bag for excludes services are not created.

1 param ( 2 [string] $excludeservicelist 3 ) 4 if (test-path $excludeservicelist) { 5 write-eventlog -logname "Operations Manager" -Source "Health Service Script" -EventID 776 -Message "WindowsAutomaticServiceMonitoring.ps1 - Accessing Exclusion List CSV" -EntryType Information 6 $contents = Import-Csv $excludeservicelist 7 } 8 $TargetComputer = hostname 9 $api = New-Object -comObject 'MOM.ScriptAPI' 10 $auto_services = Get-WmiObject -Class Win32_Service -Filter "StartMode='Auto'" 11 foreach ($service in $auto_services) 12 { 13 $isExcluded = 0 14 $state = $service.state 15 $name = $service.DisplayName 16 If ($Contents){ 17 $contents | ForEach-Object{ 18 $ExcludeServiceDisplayName = $_.ServiceToExclude 19 $ExcludeComputerName = $_.ComputersToExclude 20 if (($name -match $ExcludeServiceDisplayName) -and (($TargetComputer -match $ExcludeComputerName) -or ($ExcludeComputerName -match "ALL_COMPUTERS"))){ 21 $isExcluded = 1 22 #write-eventlog -logname "Operations Manager" -Source "Health Service Script" -EventID 777 -Message "WindowsAutomaticServiceMonitoring.ps1 - Excluded Service Name - $ExcludeServiceDisplayName, Excluded Computer Name - $ExcludeComputerName" -EntryType Information 23 } 24 } 25 } 26 if (($isExcluded -eq 0) -and ($state -eq "Stopped")){ 27 #write-eventlog -logname "Operations Manager" -Source "Health Service Script" -EventID 778 -Message "WindowsAutomaticServiceMonitoring.ps1 - Windows Service set to Automatic but Not Running - $name" -EntryType Information 28 $bag = $api.CreatePropertyBag() 29 $bag.AddValue("ServiceName", $name) 30 $bag.AddValue("Status", $state) 31 $bag 32 } 33 }

Step 3:

Create a data source module incorporating the above written Powershell script. We will use consolidator condition as discussed in solution part to alert only on valid service failures.

1 <TypeDefinitions> 2 <ModuleTypes> 3 <DataSourceModuleType ID="GKLab.Windows.Auto.Service.Monitoring.DataSource" Accessibility="Internal" Batching="false"> 4 <Configuration> 5 <xsd:element minOccurs="1" name="ExcludeServiceList" type="xsd:string" /> 6 <xsd:element minOccurs="1" name="IntervalSeconds" type="xsd:integer" /> 7 <xsd:element minOccurs="1" name="ConsolidationInterval" type="xsd:integer" /> 8 <xsd:element minOccurs="1" name="Count" type="xsd:integer" /> 9 </Configuration> 10 <OverrideableParameters> 11 <OverrideableParameter ID="IntervalSeconds" Selector="$Config/IntervalSeconds$" ParameterType="int" /> 12 <OverrideableParameter ID="Count" Selector="$Config/Count$" ParameterType="int" /> 13 <OverrideableParameter ID="ConsolidationInterval" Selector="$Config/ConsolidationInterval$" ParameterType="int" /> 14 </OverrideableParameters> 15 <ModuleImplementation Isolation="Any"> 16 <Composite> 17 <MemberModules> 18 <DataSource ID="Trigger" TypeID="System!System.SimpleScheduler"> 19 <IntervalSeconds>$Config/IntervalSeconds$</IntervalSeconds> 20 <SyncTime>00:00</SyncTime> 21 </DataSource> 22 <ProbeAction ID="Probe" TypeID="Windows!Microsoft.Windows.PowerShellPropertyBagProbe"> 23 <ScriptName>WindowsAutomaticServicesMonitoring.ps1</ScriptName> 24 <ScriptBody><![CDATA[ 25 param ( 26 [string] $excludeservicelist 27 ) 28 if (test-path $excludeservicelist) { 29 write-eventlog -logname "Operations Manager" -Source "Health Service Script" -EventID 776 -Message "WindowsAutomaticServiceMonitoring.ps1 - Accessing Exclusion List CSV" -EntryType Information 30 $contents = Import-Csv $excludeservicelist 31 } 32 $TargetComputer = hostname 33 $api = New-Object -comObject 'MOM.ScriptAPI' 34 $auto_services = Get-WmiObject -Class Win32_Service -Filter "StartMode='Auto'" 35 foreach ($service in $auto_services) 36 { 37 $isExcluded = 0 38 $state = $service.state 39 $name = $service.DisplayName 40 If ($Contents){ 41 $contents | ForEach-Object{ 42 $ExcludeServiceDisplayName = $_.ServiceToExclude 43 $ExcludeComputerName = $_.ComputersToExclude 44 if (($name -match $ExcludeServiceDisplayName) -and (($TargetComputer -match $ExcludeComputerName) -or ($ExcludeComputerName -match "ALL_COMPUTERS"))){ 45 $isExcluded = 1 46 write-eventlog -logname "Operations Manager" -Source "Health Service Script" -EventID 777 -Message "WindowsAutomaticServiceMonitoring.ps1 - Excluded Service Name - $ExcludeServiceDisplayName, Excluded Computer Name - $ExcludeComputerName" -EntryType Information 47 } 48 } 49 } 50 if (($isExcluded -eq 0) -and ($state -eq "Stopped")){ 51 write-eventlog -logname "Operations Manager" -Source "Health Service Script" -EventID 778 -Message "WindowsAutomaticServiceMonitoring.ps1 - Windows Service set to Automatic but Not Running - $name" -EntryType Information 52 $bag = $api.CreatePropertyBag() 53 $bag.AddValue("ServiceName", $name) 54 $bag.AddValue("Status", $state) 55 $bag 56 } 57 } 58 ]]></ScriptBody> 59 <Parameters> 60 <Parameter> 61 <Name>ExcludeServiceList</Name> 62 <Value>$Config/ExcludeServiceList$</Value> 63 </Parameter> 64 </Parameters> 65 <TimeoutSeconds>300</TimeoutSeconds> 66 </ProbeAction> 67 <ConditionDetection ID="Consolidator" TypeID="System!System.ConsolidatorCondition"> 68 <Consolidator> 69 <ConsolidationProperties> 70 <PropertyXPathQuery>$Target/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</PropertyXPathQuery> 71 <PropertyXPathQuery>Property[@Name='ServiceName']</PropertyXPathQuery> 72 </ConsolidationProperties> 73 <TimeControl> 74 <WithinTimeSchedule> 75 <Interval>$Config/ConsolidationInterval$</Interval> 76 </WithinTimeSchedule> 77 </TimeControl> 78 <CountingCondition> 79 <Count>$Config/Count$</Count> 80 <CountMode>OnNewItemTestOutputRestart_OnTimerSlideByOne</CountMode> 81 </CountingCondition> 82 </Consolidator> 83 </ConditionDetection> 84 </MemberModules> 85 <Composition> 86 <Node ID="Consolidator"> 87 <Node ID="Probe"> 88 <Node ID="Trigger" /> 89 </Node> 90 </Node> 91 </Composition> 92 </Composite> 93 </ModuleImplementation> 94 <OutputType>System!System.ConsolidatorData</OutputType> 95 </DataSourceModuleType> 96 </ModuleTypes> 97 </TypeDefinitions>

Step 4:

Next we will create a rule using the data source. Below configuration needs to be customized according to the need.

ExcludeServiceList – the UNC path for the excluded services list file (in CSV format). Sample CSV provided below.

CSV has two headers- “ServiceToExclude” which is display name of service.

ComputersToExclude – NetBIOS Name of computer. If two or more computers, it can be specified as individual entry or using regular expression syntax. If need to exclude in all computers, the value should be “ALL_Computers”

1 ServiceToExclude,ComputersToExclude 2 Distributed Transaction Coordinator,SCOM2012R2 3 Windows Audio,Win2k12-DC 4 Remote Registry,ALL_Computers 5 Software Protection,SCOM2012R2|Win2k12-DC

IntervalSeconds – Polling Interval in Seconds

Count – Number of polls, the service should fail to alert. (Minimum 2)

ConsolidationInterval – The interval time within which the service status fails ‘n’ number of times to generate alert.  (Minimum value = (n-1) * IntervalSeconds where n = count)

1 <Monitoring> 2 <Rules> 3 <Rule ID="GKLab.Windows.AutomaticService.Monitoring.Rule" Enabled="true" Target="Windows!Microsoft.Windows.Computer" ConfirmDelivery="true" Remotable="true" Priority="Normal" DiscardLevel="100"> 4 <Category>Alert</Category> 5 <DataSources> 6 <DataSource ID="DS" TypeID="GKLab.Windows.Auto.Service.Monitoring.DataSource"> 7 <ExcludeServiceList>\\SCOM2012R2\Configs\WindowsAutomaticServiceMonitoringExclusionList.csv</ExcludeServiceList> 8 <IntervalSeconds>300</IntervalSeconds> 9 <ConsolidationInterval>600</ConsolidationInterval> 10 <Count>2</Count> 11 </DataSource> 12 </DataSources> 13 <WriteActions> 14 <WriteAction ID="Alert" TypeID="Health!System.Health.GenerateAlert"> 15 <Priority>1</Priority> 16 <Severity>2</Severity> 17 <AlertMessageId>$MPElement[Name="GKLab.Windows.AutomaticService.Monitoring.Rule.AlertMessage"]$</AlertMessageId> 18 <AlertParameters> 19 <AlertParameter1>$Data/Context/DataItem/Property[@Name='ServiceName']$</AlertParameter1> 20 </AlertParameters> 21 <Suppression> 22 <SuppressionValue>$Target/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</SuppressionValue> 23 <SuppressionValue>$Data/Context/DataItem/Property[@Name='ServiceName']$</SuppressionValue> 24 </Suppression> 25 </WriteAction> 26 </WriteActions> 27 </Rule> 28 </Rules> 29 </Monitoring>

Step 5:

Final step is to construct XML for presentation and language packs. Ensure the close the <ManagementPack> tag.

1 <Presentation> 2 <StringResources> 3 <StringResource ID="GKLab.Windows.AutomaticService.Monitoring.Rule.AlertMessage" /> 4 </StringResources> 5 </Presentation> 6 <LanguagePacks> 7 <LanguagePack ID="ENU" IsDefault="true"> 8 <DisplayStrings> 9 <DisplayString ElementID="GKLab.Windows.Automatic.Service.Monitoring"> 10 <Name>GKLab Windows Automatic Service Monitoring</Name> 11 <Description>GKLab Windows Automatic Service Monitoring Management Pack</Description> 12 </DisplayString> 13 <DisplayString ElementID="GKLab.Windows.Auto.Service.Monitoring.DataSource"> 14 <Name>GKLab Windows Automatic Service Monitoring Data Source</Name> 15 <Description>GKLab Windows Automatic Service Monitoring Data Source</Description> 16 </DisplayString> 17 <DisplayString ElementID="GKLab.Windows.AutomaticService.Monitoring.Rule"> 18 <Name>Windows Automatic Services Monitoring Rule</Name> 19 <Description>Windows Automatic Services Monitoring Rule</Description> 20 </DisplayString> 21 <DisplayString ElementID="GKLab.Windows.AutomaticService.Monitoring.Rule" SubElementID="Alert"> 22 <Name>Alert</Name> 23 </DisplayString> 24 <DisplayString ElementID="GKLab.Windows.AutomaticService.Monitoring.Rule" SubElementID="DS"> 25 <Name>GKLab Windows Automatic Service Monitoring Data Source</Name> 26 </DisplayString> 27 <DisplayString ElementID="GKLab.Windows.AutomaticService.Monitoring.Rule.AlertMessage"> 28 <Name>Windows Automatic Services Monitoring Alert</Name> 29 <Description>Windows Service {0} is set to auto-start but is currently not running.</Description> 30 </DisplayString> 31 </DisplayStrings> 32 </LanguagePack> 33 </LanguagePacks> 34 </ManagementPack>

Step 7:

Deploy the MP in lab and check for alerts.

image

 

I have attached copy of XML which you can import in to any authoring tool. Customize as per your needs and have fun.

Happy SCOMing…


Comments (26)

  1. Subramani says:

    Nice! Just what I was looking for.

  2. kris says:

    This is great. Just one issue we have, The ALL_Computers doesn’t seem to work for exclusions. Is this something you added to SCOM or should it just work. Anyway I can check this

  3. Nicolas Etienne says:

    Hello,

    thanks for your great job.
    I just have a problem to exclude all computers. the syntax ALL_Computers, doesn’t work ( I’m on scom 2K12 R2)
    – Is it possible to use * or $ or … , for exclude all servers with the name start with SRV-* ?
    – is it possible to add a group name – it will be greate.

    Thanks

    Nicolas

    1. We can use “.*” (without quotes). I tested and it works!

  4. steven says:

    format of the csv seem does not work:
    Application Experience,ALL_Computers

    i created a shared folder \\WindowsAutomaticServiceMonitoringExclusionList\WindowsAutomaticServiceMonitoringExclusionList.csv

    1. You need to have “ServiceToExclude,ComputersToExclude” without quotes as your first line in CSV. Also update the share path in XML before importing.

  5. steven says:

    the monitoring doesn’t seem working. i try to stopped a service but the alert not triggering.

    1. If the service is not in excluded list, it should alert as per configured interval and number of retries.

  6. T says:

    Will I just need to import the xml into SCOM?

    1. Yes but you have to change the shared path where you maintain the Excluded Services List.

  7. Many have reached out to me saying “ALL_COMPUTERS” exclusion does not work. I will look in to it and update the post with changes if any.

    1. lchua says:

      hope can hear the good news

      1. A H Mohammed says:

        Hello mate, did you get a chance to investigate the “All_Computers” exclusion criteria not working issue?

      2. I had time to check back and only reason I can see is the shared file not accessible from the agent machines. can you confirm if you can see events with id #776? if it is not found, you need to give permissions to “Everyone” for the share.

  8. Steve says:

    Good solution, thanks for posting this.

  9. Steven says:

    Any update on this issue of exclusions

  10. Naren says:

    Hi,

    Seems to be wonderful solution. I am still bit confused.

    It would have been provided with SCOM screen shots for the people like me to understand easily.

    I just need to import the xml file into SCOM. Ours is SCOM 2012 SP1. Do I need to change anything in the xml in particular to SCOM2012R2 to SCOM2012SP1?

    I need to create the .csv file as mentioned and place in a folder and need to provide that path in the xml, right?

    1. Yes. Nothing to be modified in specific to SCOM version.

  11. Justin says:

    For anyone having issues with excluding a service, the exclude variables are case sensitive. To exclude all computers it must be in the syntax “,ALL_COMPUTERS” (minus the quotes). Also, if you are excluding a service with parenthesis you have to drop them. ex. “Some Service Name (serv1),ALL_COMPUTERS” has to be “Some Service Name,ALL_COMPUTERS. Hope that helps for anyone else using this. It works on SCOM 2016 btw.

    1. Thanks Mate for confirming that it is working 🙂

  12. ralph says:

    hi just curious will it be auto resolve once the service is started?

    1. No. These are alerts from rules.

  13. Aneeshcpy says:

    Hi Gowdhaman,

    Could you help me to modify the script so that there is no exclusion for services or computers.

    Regards

    1. If the CSV file is not present or accessible, the script with ignore. There will be no impact. So you can still use the MP unmodified for your requirement.

  14. Carsten says:

    Thanks for you work! It does work great in our environment.
    But, there is one exception: ‘Google Update-Service (gupdate)’
    Comparing service displayname and exclusion list entry with powershell returns ‘false’
    I guess there is something about those brackets.

    1. I have used -match criteria in PowerShell. So you can specify a part of display name (string before parenthesis) and it should pick up.

Skip to main content