This is an updated article specific to SCOM 2012 SQL maintenance. I had a previous article that focused more on SCOM 2007, and some of the data structures, and therefore recommendations have changed.
This question comes up a lot. The answer is really - not what maintenance you should be performing... but what maintenance you should be *excluding*.... or when. Here is why:
Most SQL DBA's will set up some pretty basic default maintenance on all SQL DB's they support. This often includes, but is not limited to:
- DBCC CHECKDB (to look for DB errors and report on them)
- UPDATE STATISTICS (to boost query performance)
- DBCC DBREINDEX (to rebuild the table indexes to boost performance)
SQL DBA's might schedule these to run via the SQL Agent to execute nightly, weekly, or some combination of the above depending on DB size and requirements.
On the other side of the coin.... in some companies, the MOM/OpsMgr team installs and owns the SQL server.... and they don't do ANY default maintenance to SQL. Because of this - a focus in OpsMgr was to have the Ops DB and Datawarehouse DB to be somewhat self-maintaining.... providing a good level of SQL performance whether or not any default maintenance was being done.
Reindexing is already taking place against the OperationsManager database for some of the tables. This is built into the product. What we need to ensure - is that any default DBA maintenance tasks are not redundant nor conflicting with our built-in maintenance, and our built-in schedules:
There is a rule in OpsMgr that is targeted at the All Management Servers Resource Pool:
The rule executes the "p_OptimizeIndexes" stored procedure, every day at 2:30AM:
This rule cannot be changed or modified. Therefore - we need to ensure there is not other SQL maintenance (including backups) running at 2:30AM, or performance will be impacted.
If you want to view the built-in UPDATE STATISTICS and DBCC DBREINDEX jobs history - just run the following queries:
from DomainTable dt
inner join DomainTableIndexOptimizationHistory dti
on dt.domaintablerowID = dti.domaintableindexrowID
ORDER BY optimizationdurationseconds DESC
from DomainTable dt
inner join DomainTableStatisticsUpdateHistory dti
on dt.domaintablerowID = dti.domaintablerowID
ORDER BY UpdateDurationSeconds DESC
Take note of the update/optimization duration seconds column. This will show you how long your maintenance is typically running. In a healthy environment these should not take very long.
If you want to view the fragmentation levels of the current tables in the database, run:
DBCC SHOWCONTIG WITH FAST
Here is some sample output:
DBCC SHOWCONTIG scanning 'Alert' table...
Table: 'Alert' (1771153355); index ID: 1, database ID: 5
TABLE level scan performed.
- Pages Scanned................................: 936
- Extent Switches..............................: 427
- Scan Density [Best Count:Actual Count].......: 27.34% [117:428]
- Logical Scan Fragmentation ..................: 60.90%
In general - we would like the "Scan density" to be high (Above 80%), and the "Logical Scan Fragmentation" to be low (below 30%). What you might find... is that *some* of the tables are more fragmented than others, because our built-in maintenance does not reindex all tables. Especially tables like the raw perf, event, and localizedtext tables.
That said - there is nothing wrong with running a DBA's default maintenance against the Operational database..... reindexing these tables in the database might also help console performance. We just don't want to run any DBA maintenance during the same time that we run our own internal maintenance, so try not to conflict with this schedule. Care should also be taken in any default DBA maintenance, that it does not run too long, or impact normal operations of OpsMgr. Maintenance jobs should be monitored, and should not conflict with the backup schedules as well.
Here is a reindex job you can schedule with SQL agent.... for the OpsDB. I recommend running this nightly or weekly, or using a smarter reindex script that is standardized from your DBA’s. They key thing is not to run your custom SQL index maintenance at the same time as the automated maintenance or backups.
SET ANSI_NULLS ON
SET ANSI_PADDING ON
SET ANSI_WARNINGS ON
SET ARITHABORT ON
SET CONCAT_NULL_YIELDS_NULL ON
SET QUOTED_IDENTIFIER ON
SET NUMERIC_ROUNDABORT OFF
EXEC SP_MSForEachTable "Print 'Reindexing '+'?' DBCC DBREINDEX ('?')"
Other important jobs and their start times:
- Partitioning and Grooming: 12:00 AM
- Discovery data grooming: 2:00 AM
- Alert Auto Resolve Execute All: 4:00 AM
Data Warehouse Database:
The data warehouse DB is also self-maintaining. This is called out by a rule "Standard Data Warehouse Data Set maintenance rule" which is targeted to the "Standard Data Set" object type. This stored procedure is called on the data warehouse every 60 seconds. It performs many, many tasks, of which Index optimization is but one.
This SP calls the StandardDatasetOptimize stored procedure, which handles any index operations.
To examine the index and statistics history - run the following query for the Alert, Event, Perf, and State tables:
select basetablename, optimizationstartdatetime, optimizationdurationseconds,
from StandardDatasetOptimizationHistory sdoh
inner join StandardDatasetAggregationStorageIndex sdasi
on sdoh.StandardDatasetAggregationStorageIndexRowId = sdasi.StandardDatasetAggregationStorageIndexRowId
inner join StandardDatasetAggregationStorage sdas
on sdasi.StandardDatasetAggregationStorageRowId = sdas.StandardDatasetAggregationStorageRowId
ORDER BY optimizationdurationseconds DESC
In the data warehouse - we can see that all the necessary tables are being updated and reindexed as needed. When a table is 10% fragmented - we reorganize. When it is 30% or more, we rebuild the index.
Therefore - there is no “general” need for a DBA to execute any UPDATE STATISTICS or DBCC DBREINDEX maintenance against this database. Furthermore, since we run our maintenance every 60 seconds, and only execute maintenance when necessary, there is no "set window" where we will run our maintenance jobs. This means that if a DBA team also sets up a UPDATE STATISTICS or DBCC DBREINDEX job - it can conflict with our jobs and execute concurrently.
I will caveat the above statement with from findings from the field. We have some new visualization tables and management type tables that do not get optimized, and this can lead to degraded performance. An example of that is http://www.theneverendingjourney.com/scom-2012-poor-performance-executing-sdk-microsoft_systemcenter_visualization_library_getaggregatedperformanceseries/ They found that running Update Statistics every hour was beneficial to reducing the CPU consumption of the warehouse. If you manage a very large SCOM environment, this might be worth investigating. I have seen many support cases which resulted in a manual run of Update Statistics to resolve an issue with performance.
For the above reasons, I would recommend against any maintenance jobs on the Data Warehouse DB, beyond a CHECKDB (only if DBA's mandate it) and a good backup schedule. UNLESS – you are going to analyze the data, determine which areas aren't getting index maintenance, or determine how out of date your statistics get. Then ensure any custom maintenance wont conflict with built-in maintenance
For the OpsDB: any standard maintenance is fine, as long as it does not conflict with the built-in maintenance, or impact production by taking too long, or having an impact on I/O. This includes a very regular Update Statics and then a DBreindex job.
Lastly - I'd like to discuss the recovery model of the SQL database. We default to "simple" for all our DB's. This should be left alone.... unless you have *very* specific reasons to change this. Some SQL teams automatically assume all databases should be set to "full" recovery model. This requires that they back up the transaction logs on a very regular basis, but give the added advantage of restoring up to the time of the last t-log backup. For OpsMgr, this is of very little value, as the data changing on an hourly basis is of little value compared to the complexity added by moving from simple to full. Also, changing to full will mean that your transaction logs will only checkpoint once a t-log backup is performed. What I have seen, is that many companies aren't prepared for the amount of data written to these databases.... and their standard transaction log backups (often hourly) are not frequent enough to keep them from filling. The only valid reason to change to FULL, in my opinion, is when you are using an advanced replication strategy, like SQL Always On, or log shipping, which requires full recovery model. When in doubt - keep it simple. 🙂
P.S.... The Operations Database needs 50% free space at all times. This is for growth, and for re-index operations to be successful. This is a general supportability recommendation, but the OpsDB will alert when this falls below 40%.
For the Data warehouse.... we do not require the same 50% free space. This would be a tremendous requirement if we had a multiple-terabyte database!
Think of the data warehouse to have 2 stages... a "growth" stage (while it is adding data and not yet grooming much (haven't hit the default 400 days retention) and a "maturity stage" where agent count is steady, MP's are not changing, and the grooming is happening because we are at 400 days retention. During "growth" we need to watch and maintain free space, and monitor for available disk space. In "maturity" we only need enough free space to handle our index operations. when you start talking 1 Terabyte of data.... that means 500GB of free space, which is expensive, and. If you cannot allocate it.... then just allow auto-grow and monitor the database.... but always plan for it from a volume size perspective.
For transaction log sizing - we don't have any hard rules. A good rule of thumb for the OpsDB is ~20% to 50% of the database size.... this all depends on your environment. For the Data warehouse, it depends on how large the warehouse is - but you will probably find steady state to require somewhere around 10% of the warehouse size or less. When we are doing any additional grooming of an alert/event/perf storm.... or changing grooming from 400 days to 300 days - this will require a LOT more transaction log space - so keep that in mind as your databases grow.