Windows Automatic Services Monitoring using SCOM


Monitoring services in windows computers is available out of box in SCOM through Service Monitoring Template. But in a large enterprise with over 1000s of windows computers and 100s of applications, it is difficult to list out all services that needs to be monitored in each computer and create monitoring using template. Consider monitoring on average 30 services in 1000 computers would result on 30,000 instances added to SCOM DB. This will create numerous classes, discoveries and cause bloating of instance space which will make SCOM less responsive.

Also, we cannot create a monitor for each service and target it across all computers as each service may be present on bunch of computers and not on others. Thus targeting unanimously will result in false alarms and again, we may need 30+ windows service monitors targeted to all windows computers which will create overhead on agents and thus on the computers running the agent.

So, What is the solution?

Optimal solution would be creating a single rule to monitor all automatic services in each computer and alert on those which are not running. This can be accomplished using Powershell script with property bag output.

The rule runs on each computer at specific time interval, creates property bags for each service which is set to automatic but not running and an alert is generated for each property bag.

A catch to note in this monitoring scenario is not to alert on services that are stopped only for a moment. To overcome the issue, we will use consolidator condition. So only if the service is failed for ‘n’ consecutive samples, we will alert.

This solution, though optimal pose another challenge – What if we do not want to monitor a service which is set to automatic in one or few of computers.

This can be handled using a centrally located file with details of service and the computers to be excluded from monitoring.

We will see how to construct the Management Pack XML to accomplish this. You can also create MP using Visual Studio, MP Studio or Authoring Console.

Step 1:

Add references to the Management pack.

1 <ManagementPack ContentReadable="true" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 2 <Manifest> 3 <Identity> 4 <ID>GKLab.Windows.Automatic.Service.Monitoring</ID> 5 <Version>1.0.0.0</Version> 6 </Identity> 7 <Name>GKLab Windows Automatic Service Monitoring</Name> 8 <References> 9 <Reference Alias="SC"> 10 <ID>Microsoft.SystemCenter.Library</ID> 11 <Version>6.1.7221.0</Version> 12 <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> 13 </Reference> 14 <Reference Alias="Windows"> 15 <ID>Microsoft.Windows.Library</ID> 16 <Version>6.1.7221.0</Version> 17 <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> 18 </Reference> 19 <Reference Alias="Health"> 20 <ID>System.Health.Library</ID> 21 <Version>6.1.7221.0</Version> 22 <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> 23 </Reference> 24 <Reference Alias="System"> 25 <ID>System.Library</ID> 26 <Version>6.1.7221.0</Version> 27 <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> 28 </Reference> 29 <Reference Alias="Performance"> 30 <ID>System.Performance.Library</ID> 31 <Version>6.1.7221.0</Version> 32 <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> 33 </Reference> 34 </References> 35 </Manifest>

Step 2:

Now create a Powershell property bag probe script. The Powershell script fetches list for all services that are set to start automatic and checks for the current status. For each service that are set to Automatic but not running, a property bag is created.

To exclude some services from being monitored, a centrally located CSV file is used and the path of file is passed as parameter to the script. The script reads list of services to be excluded from monitoring from CSV file and compares it with the list of services in the target computer. The property bag for excludes services are not created.

1 param ( 2 [string] $excludeservicelist 3 ) 4 if (test-path $excludeservicelist) { 5 write-eventlog -logname "Operations Manager" -Source "Health Service Script" -EventID 776 -Message "WindowsAutomaticServiceMonitoring.ps1 - Accessing Exclusion List CSV" -EntryType Information 6 $contents = Import-Csv $excludeservicelist 7 } 8 $TargetComputer = hostname 9 $api = New-Object -comObject 'MOM.ScriptAPI' 10 $auto_services = Get-WmiObject -Class Win32_Service -Filter "StartMode='Auto'" 11 foreach ($service in $auto_services) 12 { 13 $isExcluded = 0 14 $state = $service.state 15 $name = $service.DisplayName 16 If ($Contents){ 17 $contents | ForEach-Object{ 18 $ExcludeServiceDisplayName = $_.ServiceToExclude 19 $ExcludeComputerName = $_.ComputersToExclude 20 if (($name -match $ExcludeServiceDisplayName) -and (($TargetComputer -match $ExcludeComputerName) -or ($ExcludeComputerName -match "ALL_COMPUTERS"))){ 21 $isExcluded = 1 22 #write-eventlog -logname "Operations Manager" -Source "Health Service Script" -EventID 777 -Message "WindowsAutomaticServiceMonitoring.ps1 - Excluded Service Name - $ExcludeServiceDisplayName, Excluded Computer Name - $ExcludeComputerName" -EntryType Information 23 } 24 } 25 } 26 if (($isExcluded -eq 0) -and ($state -eq "Stopped")){ 27 #write-eventlog -logname "Operations Manager" -Source "Health Service Script" -EventID 778 -Message "WindowsAutomaticServiceMonitoring.ps1 - Windows Service set to Automatic but Not Running - $name" -EntryType Information 28 $bag = $api.CreatePropertyBag() 29 $bag.AddValue("ServiceName", $name) 30 $bag.AddValue("Status", $state) 31 $bag 32 } 33 }

Step 3:

Create a data source module incorporating the above written Powershell script. We will use consolidator condition as discussed in solution part to alert only on valid service failures.

1 <TypeDefinitions> 2 <ModuleTypes> 3 <DataSourceModuleType ID="GKLab.Windows.Auto.Service.Monitoring.DataSource" Accessibility="Internal" Batching="false"> 4 <Configuration> 5 <xsd:element minOccurs="1" name="ExcludeServiceList" type="xsd:string" /> 6 <xsd:element minOccurs="1" name="IntervalSeconds" type="xsd:integer" /> 7 <xsd:element minOccurs="1" name="ConsolidationInterval" type="xsd:integer" /> 8 <xsd:element minOccurs="1" name="Count" type="xsd:integer" /> 9 </Configuration> 10 <OverrideableParameters> 11 <OverrideableParameter ID="IntervalSeconds" Selector="$Config/IntervalSeconds$" ParameterType="int" /> 12 <OverrideableParameter ID="Count" Selector="$Config/Count$" ParameterType="int" /> 13 <OverrideableParameter ID="ConsolidationInterval" Selector="$Config/ConsolidationInterval$" ParameterType="int" /> 14 </OverrideableParameters> 15 <ModuleImplementation Isolation="Any"> 16 <Composite> 17 <MemberModules> 18 <DataSource ID="Trigger" TypeID="System!System.SimpleScheduler"> 19 <IntervalSeconds>$Config/IntervalSeconds$</IntervalSeconds> 20 <SyncTime>00:00</SyncTime> 21 </DataSource> 22 <ProbeAction ID="Probe" TypeID="Windows!Microsoft.Windows.PowerShellPropertyBagProbe"> 23 <ScriptName>WindowsAutomaticServicesMonitoring.ps1</ScriptName> 24 <ScriptBody><![CDATA[ 25 param ( 26 [string] $excludeservicelist 27 ) 28 if (test-path $excludeservicelist) { 29 write-eventlog -logname "Operations Manager" -Source "Health Service Script" -EventID 776 -Message "WindowsAutomaticServiceMonitoring.ps1 - Accessing Exclusion List CSV" -EntryType Information 30 $contents = Import-Csv $excludeservicelist 31 } 32 $TargetComputer = hostname 33 $api = New-Object -comObject 'MOM.ScriptAPI' 34 $auto_services = Get-WmiObject -Class Win32_Service -Filter "StartMode='Auto'" 35 foreach ($service in $auto_services) 36 { 37 $isExcluded = 0 38 $state = $service.state 39 $name = $service.DisplayName 40 If ($Contents){ 41 $contents | ForEach-Object{ 42 $ExcludeServiceDisplayName = $_.ServiceToExclude 43 $ExcludeComputerName = $_.ComputersToExclude 44 if (($name -match $ExcludeServiceDisplayName) -and (($TargetComputer -match $ExcludeComputerName) -or ($ExcludeComputerName -match "ALL_COMPUTERS"))){ 45 $isExcluded = 1 46 write-eventlog -logname "Operations Manager" -Source "Health Service Script" -EventID 777 -Message "WindowsAutomaticServiceMonitoring.ps1 - Excluded Service Name - $ExcludeServiceDisplayName, Excluded Computer Name - $ExcludeComputerName" -EntryType Information 47 } 48 } 49 } 50 if (($isExcluded -eq 0) -and ($state -eq "Stopped")){ 51 write-eventlog -logname "Operations Manager" -Source "Health Service Script" -EventID 778 -Message "WindowsAutomaticServiceMonitoring.ps1 - Windows Service set to Automatic but Not Running - $name" -EntryType Information 52 $bag = $api.CreatePropertyBag() 53 $bag.AddValue("ServiceName", $name) 54 $bag.AddValue("Status", $state) 55 $bag 56 } 57 } 58 ]]></ScriptBody> 59 <Parameters> 60 <Parameter> 61 <Name>ExcludeServiceList</Name> 62 <Value>$Config/ExcludeServiceList$</Value> 63 </Parameter> 64 </Parameters> 65 <TimeoutSeconds>300</TimeoutSeconds> 66 </ProbeAction> 67 <ConditionDetection ID="Consolidator" TypeID="System!System.ConsolidatorCondition"> 68 <Consolidator> 69 <ConsolidationProperties> 70 <PropertyXPathQuery>$Target/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</PropertyXPathQuery> 71 <PropertyXPathQuery>Property[@Name='ServiceName']</PropertyXPathQuery> 72 </ConsolidationProperties> 73 <TimeControl> 74 <WithinTimeSchedule> 75 <Interval>$Config/ConsolidationInterval$</Interval> 76 </WithinTimeSchedule> 77 </TimeControl> 78 <CountingCondition> 79 <Count>$Config/Count$</Count> 80 <CountMode>OnNewItemTestOutputRestart_OnTimerSlideByOne</CountMode> 81 </CountingCondition> 82 </Consolidator> 83 </ConditionDetection> 84 </MemberModules> 85 <Composition> 86 <Node ID="Consolidator"> 87 <Node ID="Probe"> 88 <Node ID="Trigger" /> 89 </Node> 90 </Node> 91 </Composition> 92 </Composite> 93 </ModuleImplementation> 94 <OutputType>System!System.ConsolidatorData</OutputType> 95 </DataSourceModuleType> 96 </ModuleTypes> 97 </TypeDefinitions>

Step 4:

Next we will create a rule using the data source. Below configuration needs to be customized according to the need.

ExcludeServiceList – the UNC path for the excluded services list file (in CSV format). Sample CSV provided below.

CSV has two headers- “ServiceToExclude” which is display name of service.

ComputersToExclude – NetBIOS Name of computer. If two or more computers, it can be specified as individual entry or using regular expression syntax. If need to exclude in all computers, the value should be “ALL_Computers”

1 ServiceToExclude,ComputersToExclude 2 Distributed Transaction Coordinator,SCOM2012R2 3 Windows Audio,Win2k12-DC 4 Remote Registry,ALL_Computers 5 Software Protection,SCOM2012R2|Win2k12-DC

IntervalSeconds – Polling Interval in Seconds

Count – Number of polls, the service should fail to alert. (Minimum 2)

ConsolidationInterval – The interval time within which the service status fails ‘n’ number of times to generate alert.  (Minimum value = (n-1) * IntervalSeconds where n = count)

1 <Monitoring> 2 <Rules> 3 <Rule ID="GKLab.Windows.AutomaticService.Monitoring.Rule" Enabled="true" Target="Windows!Microsoft.Windows.Computer" ConfirmDelivery="true" Remotable="true" Priority="Normal" DiscardLevel="100"> 4 <Category>Alert</Category> 5 <DataSources> 6 <DataSource ID="DS" TypeID="GKLab.Windows.Auto.Service.Monitoring.DataSource"> 7 <ExcludeServiceList>\\SCOM2012R2\Configs\WindowsAutomaticServiceMonitoringExclusionList.csv</ExcludeServiceList> 8 <IntervalSeconds>300</IntervalSeconds> 9 <ConsolidationInterval>600</ConsolidationInterval> 10 <Count>2</Count> 11 </DataSource> 12 </DataSources> 13 <WriteActions> 14 <WriteAction ID="Alert" TypeID="Health!System.Health.GenerateAlert"> 15 <Priority>1</Priority> 16 <Severity>2</Severity> 17 <AlertMessageId>$MPElement[Name="GKLab.Windows.AutomaticService.Monitoring.Rule.AlertMessage"]$</AlertMessageId> 18 <AlertParameters> 19 <AlertParameter1>$Data/Context/DataItem/Property[@Name='ServiceName']$</AlertParameter1> 20 </AlertParameters> 21 <Suppression> 22 <SuppressionValue>$Target/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</SuppressionValue> 23 <SuppressionValue>$Data/Context/DataItem/Property[@Name='ServiceName']$</SuppressionValue> 24 </Suppression> 25 </WriteAction> 26 </WriteActions> 27 </Rule> 28 </Rules> 29 </Monitoring>

Step 5:

Final step is to construct XML for presentation and language packs. Ensure the close the <ManagementPack> tag.

1 <Presentation> 2 <StringResources> 3 <StringResource ID="GKLab.Windows.AutomaticService.Monitoring.Rule.AlertMessage" /> 4 </StringResources> 5 </Presentation> 6 <LanguagePacks> 7 <LanguagePack ID="ENU" IsDefault="true"> 8 <DisplayStrings> 9 <DisplayString ElementID="GKLab.Windows.Automatic.Service.Monitoring"> 10 <Name>GKLab Windows Automatic Service Monitoring</Name> 11 <Description>GKLab Windows Automatic Service Monitoring Management Pack</Description> 12 </DisplayString> 13 <DisplayString ElementID="GKLab.Windows.Auto.Service.Monitoring.DataSource"> 14 <Name>GKLab Windows Automatic Service Monitoring Data Source</Name> 15 <Description>GKLab Windows Automatic Service Monitoring Data Source</Description> 16 </DisplayString> 17 <DisplayString ElementID="GKLab.Windows.AutomaticService.Monitoring.Rule"> 18 <Name>Windows Automatic Services Monitoring Rule</Name> 19 <Description>Windows Automatic Services Monitoring Rule</Description> 20 </DisplayString> 21 <DisplayString ElementID="GKLab.Windows.AutomaticService.Monitoring.Rule" SubElementID="Alert"> 22 <Name>Alert</Name> 23 </DisplayString> 24 <DisplayString ElementID="GKLab.Windows.AutomaticService.Monitoring.Rule" SubElementID="DS"> 25 <Name>GKLab Windows Automatic Service Monitoring Data Source</Name> 26 </DisplayString> 27 <DisplayString ElementID="GKLab.Windows.AutomaticService.Monitoring.Rule.AlertMessage"> 28 <Name>Windows Automatic Services Monitoring Alert</Name> 29 <Description>Windows Service {0} is set to auto-start but is currently not running.</Description> 30 </DisplayString> 31 </DisplayStrings> 32 </LanguagePack> 33 </LanguagePacks> 34 </ManagementPack>

Step 7:

Deploy the MP in lab and check for alerts.

image

 

I have attached copy of XML which you can import in to any authoring tool. Customize as per your needs and have fun.

Happy SCOMing…


Comments (10)

  1. Subramani says:

    Nice! Just what I was looking for.

  2. kris says:

    This is great. Just one issue we have, The ALL_Computers doesn’t seem to work for exclusions. Is this something you added to SCOM or should it just work. Anyway I can check this

  3. Nicolas Etienne says:

    Hello,

    thanks for your great job.
    I just have a problem to exclude all computers. the syntax ALL_Computers, doesn’t work ( I’m on scom 2K12 R2)
    – Is it possible to use * or $ or … , for exclude all servers with the name start with SRV-* ?
    – is it possible to add a group name – it will be greate.

    Thanks

    Nicolas

  4. steven says:

    format of the csv seem does not work:
    Application Experience,ALL_Computers

    i created a shared folder \\WindowsAutomaticServiceMonitoringExclusionList\WindowsAutomaticServiceMonitoringExclusionList.csv

  5. steven says:

    the monitoring doesn’t seem working. i try to stopped a service but the alert not triggering.

    1. If the service is not in excluded list, it should alert as per configured interval and number of retries.

  6. T says:

    Will I just need to import the xml into SCOM?

    1. Yes but you have to change the shared path where you maintain the Excluded Services List.

  7. Many have reached out to me saying “ALL_COMPUTERS” exclusion does not work. I will look in to it and update the post with changes if any.

    1. lchua says:

      hope can hear the good news

Skip to main content