Grooming in SCOM

Operation Manger Database

How Operations manager grooming works and how to check

The grooming settings are stored inthe databasein the PartitionandGroomingSettings table. Each table uses a different stored procedure to groom. You can run the following command which can give you the jobs and the groomingruntime to check when it was last run.

select  ObjectName,GroomingSproc, DaysToKeep,GroomingRunTime from dbo.PartitionAndGroomingSettings

You can run the following query select * from dbo.InternalJobHistory

Which can let you know when the groom job has been executed and on failure the Status Code = 0 and for success Status Code = 1. The comments can also give use the clue why the same failed.

 

Operation Manger Database DW

How Operations manager DW grooming works and how to check

In the data warehouse the dataset maintenance task for each dataset (Event ,alert, performance, state). This is run every minute as a part of this the stage data is processed and then you can see that the groom job will be triggered. To find the GUID you can run the following command. select * from StandardDataset once you get the GUID how the processing is done is the stored proc StandardDatasetMaintenance. (exec StandardDatasetMaintenance @DatasetId=' 6AF799BD-2CCC-41CF-97FD-058E2CBF9248'. This GUID was available for the alert data when we ran the query select * from StandardDataset ).

Now how to check if grooming is working or not. First run the query to check the retention dates.

select ds.datasetDefaultName AS 'Dataset Name', sda.AggregationTypeId AS 'Agg Type 0=raw, 20=Hourly, 30=Daily', sda.MaxDataAgeDays AS 'Retention Time in Days'

from dataset ds, StandardDatasetAggregation sda

WHERE ds.datasetid = sda.datasetid ORDER by ds.datasetDefaultName

Now to check how much data for each data is there you can use this query

SELECT DATEDIFF(d, MIN(DWCreatedDateTime), GETDATE()) AS [Current] FROM Alert.vAlert

SELECT DATEDIFF(d, MIN(DateTime), GETDATE()) AS [Current] FROM Event.vEvent

SELECT DATEDIFF(d, MIN(DateTime), GETDATE()) AS [Current] FROM Perf.vPerfRaw

SELECT DATEDIFF(d, MIN(DateTime), GETDATE()) AS [Current] FROM Perf.vPerfHourly

SELECT DATEDIFF(d, MIN(DateTime), GETDATE()) AS [Current] FROM Perf.vPerfDaily

SELECT DATEDIFF(d, MIN(DateTime), GETDATE()) AS [Current] FROM State.vStateRaw

SELECT DATEDIFF(d, MIN(DateTime), GETDATE()) AS [Current] FROM State.vStateHourly

SELECT DATEDIFF(d, MIN(DateTime), GETDATE()) AS [Current] FROM State.vStateDaily

Once you are able to find that a particular dataset has number data that what you have configured. You need to troubleshoot like this.

The first thing you should do if you suspect the grooming jobs are not running properly is check the event logs on the RMS.
If you don't see any errors, you can increase the logging debuglevel for the grooming jobs by running the following query:

Update StandardDataSet

Set DebugLevel = 3

Where DebugLevel != 3

Note: The available levels for grooming are 0,1,2,3 with each level increasing the verbosity of information logged. The default value is 0 for all datasets. Once debuglevel is increased to 3 all grooming and aggregation jobs for the specified dataset will be logged to the DebugMessage table. And for the same you can run this

select * from DebugMessage

The last column in this table, OperationDurationms is the duration it took for the grooming job to complete. Look for any really high duration, it could mean the grooming job is timing out. Please check if there are continuous NULL values as that also means timeouts.

Many a times the failure can arise when the standarddatabase maintenance dataset is not running correctly. This can happen when lot of data is there on staging tables and you can use this link for more details https://blogs.technet.com/b/sudheesn/archive/2009/07/28/getting-31552-exception-sqlexception-timeout-expired-very-frequently-in-scom-server.aspx

Operation Manger Audit collection Database

How Operations manager Audit collection grooming works and how to check

Every 198 seconds, a “Checkpoint” is initiated on the database. At this point, we also check if any partitions in a status of “2” are older than Number of Partitions * 86400 (both configurable). If they are, dbdeletepartition.sql is initiated against the partition GUID.

To check this you can run command SELECT * FROM dtconfig.

The "Partition Close Time" field in the dtpartition table to determine if a partition is elegable for grooming. select min(Partitionclosetime) from dtPartition And this should be not graeter than 17 days if you had set 15 days data period.

 ==============================================================================================

Sudheesh Narayanaswamy | Support Engineer | Microsoft