Monitor the operations database grooming procedure

UPDATE: Monitoring the operational database grooming procedure is now included in the SCOM management pack since version 6.1.7672.0. I will keep this post up as an example of how to create a monitor based on a script that queries a database.

Often times we’re not aware of, or concerned about grooming in Operations Manager.  After all, it’s an internal process that is managed by the RMS and we might expect an alert if anything goes awry with the operations database.  Right?

The System Center Core Monitoring management pack does have the Operational Database Space Free (%) monitor, which will generate an alert if the database is running low on space; 40%=Warning and 20%=Critical, by default.  This is our first indicator that something is not right, because we should size the operations database accordingly to accommodate a steady operational state.  This is the main reason we recommend not enabling auto growth on the operations database.

Is it good enough to know when the operations database is running out of space?  Not for me.  Because running out of disk space is usually a belated indication and symptom of another issue.

 

Common causes of database growth

The only time the operations database will grow beyond what is expected during steady operational state, is when we introduce some noise.  The issue is usually one or more of the following:

* Poorly written monitor was introduced
* Too much data collected by new rule
* New management pack(s) imported
* Adding agents to management group 
* Transaction Log is not sized accordingly

At the end of the day it’s the grooming procedure that isn’t running successfully.  Unless there was some issue not directly related to the grooming process (performance or availability problem), if the grooming procedure fails one time it is likely that it will continue failing until an administrator intervenes.

The last bullet above, Transaction Log is not sized accordingly, is the reason grooming will continue to fail after the first failure.  Because when grooming fails, it’s almost always because the transaction log ran out of space.

If the transaction log runs out of free space during the first failed groom interval, it certainly will not have enough space during subsequent intervals to successfully groom the data that created the large drop in the first place.

 

So why not monitor the grooming procedure?

Out of the box, there is no monitoring of the grooming procedures that keeps the operational database clean and performing well.  I hope this is included in core monitoring at some point, but for now we’ll need to create a simple workflow to monitor grooming.

I could create some sort of extended management pack, which would properly discover the operations database server and include the associated monitor, but I’d rather show you how to do it so you can create this in your own custom management pack.  Most customers already have some sort of extended core monitoring management pack, and who needs another MP with a single workflow?

Before going further, please see the improved monitor instructions here.

Create a unit monitor

Timed Script Two State Monitor
image

Configure general properties as shown, and uncheck Monitor is enabled.
image

 

Configure schedule as shown.
image

Configure as shown, then click Parameters. (see script at end of post)
image

 

Configure parameters as shown.  If you have a custom name for your database, replace the parameter with your custom database name.
image

Create Unhealthy Expression filter as shown.
image

Create Healthy Expression filter as shown.
image

Configure Health Mapping as shown.
image

 

Configure Alert Settings as shown.
image

 

Enable the new monitor

Scope the Authoring space to Windows Server Operating System and find your new monitor.

Override for a specific object of class: Windows Server Operating System
image

Find the server hosting the operations database and select it.
image

Set the Enabled parameter to True.
image

 

The Results

Grooming failed

alert
image

state
image

state change event
image 

Grooming succeeded

alert auto-resolved
image

state
image

state change event
image

 

The Script

 

 

 

'OperationalDatabaseGroomingProcedureMonitor.vbs

Option Explicit
'Declarations
Dim objCN,objRS,strQuery,strInternalJobHistoryId,strTimeStarted,strStatusCode,strCommand
Dim oArgs,oAPI,oBag
Dim strDatabase,strDBServer

'Define local event constants
Const EVENT_TYPE_ERROR = 1
Const EVENT_TYPE_WARNING = 2
Const EVENT_TYPE_INFORMATION = 4

'Create objects
Set oAPI = CreateObject("MOM.ScriptAPI")
Set oArgs = WScript.Arguments
Set oBag = oAPI.CreatePropertyBag()

'Define parameters
strDBServer = "."
strDatabase = oArgs(0)

'Set DB connection
Set objCN = CreateObject("ADODB.Connection")
objCN.Open "Provider=SQLOLEDB.1;Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=" & strDatabase & ";Data Source=" & strDBServer & ""

strQuery = "SELECT InternalJobHistoryId, TimeStarted, StatusCode, Command FROM InternalJobHistory " & _
"WHERE (CONVERT(varchar, TimeStarted, 101) = CONVERT(varchar, GETDATE(), 101)) and Command = 'Exec dbo.p_GroomPartitionedObjects and dbo.p_Grooming' " & _
"ORDER BY InternalJobHistoryId DESC"

'Query DB
Set objRS = objCN.Execute(strQuery)

'Set variables
strInternalJobHistoryId = objRS ("InternalJobHistoryId")
strTimeStarted = objRS ("TimeStarted")
strStatusCode = objRS ("StatusCode")
strCommand = objRS ("Command")

'Submit Property Bag
Call oBag.AddValue("DBServer",strDBServer)
Call oBag.AddValue("Database",strDatabase)
Call oBag.AddValue("InternalJobHistoryId",strInternalJobHistoryId)
Call oBag.AddValue("TimeStarted",strTimeStarted)
Call oBag.AddValue("StatusCode",strStatusCode)
Call oBag.AddValue("Command",strCommand)

'Healthy state
If strStatusCode = 1 Then
Call oBag.AddValue("State","Healthy")
'Call oAPI.LogScriptEvent("OperationalDatabaseGroomingProcedureMonitor.vbs",100,EVENT_TYPE_INFORMATION,"Healthy")

'Critical state
Else
Call oBag.AddValue("State","Critical")
Call oBag.AddValue("Details","Operational database grooming return status code " & strStatusCode & " for procedure " & strCommand & ", " & _
"which was started at " & strTimeStarted & ". Check Internal Job History Id " & strInternalJobHistoryId & ".")
'Call oAPI.LogScriptEvent("OperationalDatabaseGroomingProcedureMonitor.vbs",100,EVENT_TYPE_INFORMATION,"Critical")
End If

'Return property values
Call oAPI.Return(oBag)

'Log event to Operations Manager log. For testing only.
'Call oAPI.LogScriptEvent("OperationalDatabaseGroomingProcedureMonitor.vbs",100,EVENT_TYPE_INFORMATION,"Operational database grooming return status code " & strStatusCode & " for procedure " & strCommand & ", " & _
' "which was started at " & strTimeStarted & ". Check Internal Job History Id " & strInternalJobHistoryId & ".")