Grooming of the OpsDB in OpsMgr 2012 is very similar to OpsMgr 2007. Grooming is called once per day at 12:00am…. by the rule: “Partitioning and Grooming” You can search for this rule in the Authoring space of the console, under Rules. It is targeted to the “All Management Servers Resource Pool” and is part of the System Center Internal Library.
It calls the “p_PartitioningAndGrooming” stored procedure. This SP calls two other SP’s: p_Partitioning and then p_Grooming
p_Partitioning inspects the table PartitionAndGroomingSettings, and then calls the SP p_PartitionObject for each object in the PartitionAndGroomingSettings table where "IsPartitioned = 1" (note – we partition event and perf into 61 daily tables – just like MOM 2005/SCOM 2007)
The PartitionAndGroomingSettings table:
The p_PartitionObject SP first identifies the next partition in the sequence, truncates it to make sure it is empty, and then updates the PartitionTables table in the database, to update the IsCurrent field to the next numeric table for events and perf. It also sets the current time as the partition end time in the previous “is current” row, and sets the current time in the partition start time of the new “is current” row. Then it calls the p_PartitionAlterInsertView sproc, to make new data start writing to the “new” current event and perf table.
To review which tables you are writing to – execute the following query: select * from partitiontables where IsCurrent = ‘1’
A select * from partitiontables will show you all 61 event and perf tables, and when they were used. You should see a PartitionStartTime updated every day – around midnight (time is stored in UTC in the database). If partitioning is failing to run, then we wont see this date changing every day.
Ok – that’s the first step of the p_PartitioningAndGrooming sproc – Partitioning. Now – if that is all successful, we will start grooming!
The p_Grooming is called after partitioning is successful. One of the first things it does – is to update the InternalJobHistory table. In this table – we keep a record of all partitioning and grooming jobs. It is a good spot check to see what’s going on with grooming. To have a peek at this table – execute a select * from InternalJobHistory order by InternalJobHistoryId DESC
The p_Grooming sproc then calls p_GroomPartitionedObjects
p_GroomPartitionedObjects will first examine the PartitionAndGroomingSettings and compare the “days to keep” column value, against the current date, to figure out how many partitions to keep vs groom. It will then inspect the partitions (tables) to ensure they have data, and then truncate the partition, by calling p_PartitionTruncate. A truncate command is just a VERY fast and efficient way to delete all data from a table without issuing a highly transactional DELETE command. The p_GroomPartitionedObjects sproc will then update the PartitionAndGroomingSettings table with the current time, under the GroomingRunTime column, to reflect when grooming last ran.
Next – the p_Grooming sproc continues, by calling p_GroomNonPartitionedObjects.
p_GroomNonPartitionedObjects is a short, but complex sproc – in that is calls all the individual sprocs listed in the PartitionAndGroomingSettings table where IsPartitioned = 0. The following stored procedures are present in my database as non-partitioned data:
Now, for the above sprocs, each one could potentially return a success or failure. They will also likely call additional sprocs, for specific tasks. You can see, the rabbit hole is deep. This is just an example of the complexity involved in self-maintenance and grooming. If you are experiencing a grooming failure of any kind, and the error messages involve any of the above stored procedures when you execute p_PartitioningAndGrooming manually, you should open a support case with Microsoft for troubleshooting and resolution. The theory is, that each of the above procedures grooms a specific non-partitioned dataset. Under NORMAL circumstances, each should be able to complete in a reasonable time frame. The challenge becomes evident when you have something go wrong, like alert storms, state change even storms from monitors flip-flop, lots of performance signature data from using self-tuning threshold monitors, huge amounts of pending SDK datasource data from large Exchange 2010 environments, or from other MP’s that might leverage this. Grooming non-partitioned data is slow, and highly resource intensive and transactional. These are specific delete statements, from tables directly, often combined with creating temp tables in TempDB. Having a good presized high performance TempDB can help, as will ensuring you have plenty of transaction log space for the database, and having the disk subsystem offer as many IOPS as possible. http://technet.microsoft.com/en-us/library/ms175527(v=SQL.105).aspx
Next – the p_Grooming sproc continues, by updating the InternalJobHistory table, to give it a status of success (StatusCode of 1 = success, 2= failed, 0 appears to be never completed?)
If you ever have a problem with grooming – or need to get your OpsDB database size under control – simply reduce the data retention days, in the console, under Administration, Settings, Database Grooming. To start with – I recommend setting all these to just 2 days, from the default of 7. This keeps your OpsDB under control until you have time to tune all the noise from the MP’s you import. So just reduce this number, then open up query analyzer, and execute EXEC p_PartitioningAndGrooming When it is done, check the job status by executing select * from InternalJobHistory order by InternalJobHistoryId DESC The last groom job should be present, and successful. The OpsDB size should be smaller, with more free space. And to validate, you can always run my large table query, found at: Useful Operations Manager 2007 SQL queries