What SQL maintenance should I perform on my OpsMgr databases?


 

This article is for SCOM 2007.  It does not apply to SCOM 2012.  See updated article for SCOM 2012.

 

This question comes up a lot.  The answer is really – not what maintenance you should be performing… but what maintenance you should be *excluding*…. or when.  Here is why:

Most SQL DBA's will set up some pretty basic default maintenance on all SQL DB's they support.  This often includes, but is not limited to:

DBCC CHECKDB  (to look for DB errors and report on them)

UPDATE STATISTICS  (to boost query performance)

DBCC DBREINDEX  (to rebuild the table indexes to boost performance)

BACKUP

SQL DBA's might schedule these to run via the SQL Agent to execute nightly, weekly, or some combination of the above depending on DB size and requirements.

On the other side of the coin…. in some companies, the MOM/OpsMgr team installs and owns the SQL server…. and they dont do ANY default maintenance to SQL.  Because of this – a focus in OpsMgr was to have the Ops DB and Datawarehouse DB to be fully self-maintaining…. providing a good level of SQL performance whether or not any default maintenance was being done.

Operational Database:

Reindexing is already taking place against the OperationsManager database for some of the tables.  This is built into the product.  What we need to ensure – is that any default DBA maintenance tasks are not redundant nor conflicting with our built-in maintenance, and our built-in schedules:

There is a rule in OpsMgr that is targeted at the Root Management Server:

image

The rule executes the "p_OptimizeIndexes" stored procedure, every day at 2:30AM:

image

image

This rule cannot be changed or modified.  Therefore – we need to ensure there is not other SQL maintenance (including backups) running at 2:30AM, or performance will be impacted.

If you want to view the built-in UPDATE STATISTICS and DBCC DBREINDEX jobs history – just run the following queries:

select *
from DomainTable dt
inner join DomainTableIndexOptimizationHistory dti
on dt.domaintablerowID = dti.domaintableindexrowID
ORDER BY optimizationdurationseconds DESC

select *
from DomainTable dt
inner join DomainTableStatisticsUpdateHistory dti
on dt.domaintablerowID = dti.domaintablerowID
ORDER BY UpdateDurationSeconds DESC

Take note of the update/optimization duration seconds column.  This will show you how long your maintenance is typically running.  In a healthy environment these should not take very long.

 

If you want to view the fragmentation levels of the current tables in the database, run:

DBCC SHOWCONTIG WITH FAST

Here is some sample output:

———————————————————————————————-

DBCC SHOWCONTIG scanning 'Alert' table…
Table: 'Alert' (1771153355); index ID: 1, database ID: 5
TABLE level scan performed.
– Pages Scanned…………………………..: 936
– Extent Switches…………………………: 427
Scan Density [Best Count:Actual Count]…….: 27.34% [117:428]
Logical Scan Fragmentation ………………: 60.90%

———————————————————————————————-

In general – we would like the "Scan density" to be high (Above 80%), and the "Logical Scan Fragmentation" to be low (below 30%).  What you might find… is that *some* of the tables are more fragmented than others, because our built-in maintenance does not reindex all tables.  Especially tables like the raw perf, event, and localizedtext tables.

That said – there is nothing wrong with running a DBA's default maintenance against the Operational database….. reindexing these tables in the database might also help console performance.  We just dont want to run any DBA maintenance during the same time that we run our own internal maintenance, so try not to conflict with this schedule.  Care should also be taken in any default DBA maintenance, that it does not run too long, or impact normal operations of OpsMgr.  Maintenance jobs should be monitored, and should not conflict with the backup schedules as well.

Here is a reindex job you can schedule with SQL agent…. for the OpsDB:

USE OperationsManager
go
SET ANSI_NULLS ON
SET ANSI_PADDING ON
SET ANSI_WARNINGS ON
SET ARITHABORT ON
SET CONCAT_NULL_YIELDS_NULL ON
SET QUOTED_IDENTIFIER ON
SET NUMERIC_ROUNDABORT OFF
EXEC SP_MSForEachTable "Print 'Reindexing '+'?' DBCC DBREINDEX ('?')"

 

Data Warehouse Database:

The data warehouse DB is also fully self maintaining.  This is called out by a rule "Standard Data Warehouse Data Set maintenance rule" which is targeted to the "Standard Data Set" object type.  This stored procedure is called on the data warehouse every 60 seconds.  It performs many, many tasks, of which Index optimization is but one.

image

This SP calls the StandardDatasetOptimize stored procedure, which handles any index operations.

To examine the index and statistics history – run the following query for the Alert, Event, Perf, and State tables:

 

select basetablename, optimizationstartdatetime, optimizationdurationseconds,
      beforeavgfragmentationinpercent, afteravgfragmentationinpercent,
      optimizationmethod, onlinerebuildlastperformeddatetime
from StandardDatasetOptimizationHistory sdoh
inner join StandardDatasetAggregationStorageIndex sdasi
on sdoh.StandardDatasetAggregationStorageIndexRowId = sdasi.StandardDatasetAggregationStorageIndexRowId
inner join StandardDatasetAggregationStorage sdas
on sdasi.StandardDatasetAggregationStorageRowId = sdas.StandardDatasetAggregationStorageRowId
ORDER BY optimizationdurationseconds DESC

 

Then examine the default domain tables optimization history…. run the same two queries as listed above for the OperationsDB.

In the data warehouse – we can see that all the necessary tables are being updated and reindexed as needed.  When a table is 10% fragmented – we reorganize.  When it is 30% or more, we rebuild the index.

Therefore – there is no need for a DBA to execute any UPDATE STATISTICS or DBCC DBREINDEX maintenance against this database.  Furthermore, since we run our maintenance every 60 seconds, and only execute maintenance when necessary, there is no "set window" where we will run our maintenance jobs.  This means that if a DBA team also sets up a UPDATE STATISTICS or DBCC DBREINDEX job – it can conflict with our jobs and execute concurrently.  This should not be performed. 

 

For the above reasons, I would recommend against any maintenance jobs on the Data Warehouse DB, beyond a CHECKDB (only if DBA's mandate it) and a good backup schedule. 

 

For the OpsDB: any standard maintenance is fine, as long as it does not conflict with the built-in maintenance, or impact production by taking too long, or having an impact on I/O.

 

Lastly – I'd like to discuss the recovery model of the SQL database.  We default to "simple" for all our DB's.  This should be left alone…. unless you have *very* specific reasons to change this.  Some SQL teams automatically assume all databases should be set to "full" recovery model.  This requires that they back up the transaction logs on a very regular basis, but give the added advantage of restoring up to the time of the last t-log backup.  For OpsMgr, this is of very little value, as the data changing on an hourly basis is of little value compared to the complexity added by moving from simple to full.  Also, changing to full will mean that your transaction logs will only checkpoint once a t-log backup is performed.  What I have seen, is that many companies aren't prepared for the amount of data written to these databases…. and their standard transaction log backups (often hourly) are not frequent enough to keep them from filling.  The only valid reason to change to FULL, in my opinion, is when you are using an advanced replication strategy, like log shipping, which requires full recovery model.  When in doubt – keep it simple.  🙂

 

 

P.S….  The Operations Database needs 50% free space at all times.  This is for growth, and for re-index operations to be successful.  This is a general supportability recommendation, but the OpsDB will alert when this falls below 40%. 

For the Data warehouse…. we do not require the same 50% free space.  This would be a temendous requireemnts if we had a multiple-terabyte database!

Think of the data warehouse to have 2 stages… a "growth" stage (while it is adding data and not yet grooming much (havent hit the default 400 days retention) and a "maturity stage" where agent count is steady, MP's are not changing, and the grooming is happening because we are at 400 days retention.  During "growth" we need to watch and maintain free space, and monitor for available disk space.  In "maturity" we only need enough free space to handle our index operations.  when you start talking 1 Terabyte of data…. that means 500GB of free space, which is expensive, and.  If you cannot allocate it…. then just allow auto-grow and monitor the database…. but always plan for it from a volume size perspective.

For transaction log sizing – we don't have any hard rules.  A good rule of thumb for the OpsDB is ~20% to 50% of the database size…. this all depends on your environment.  For the Data warehouse, it depends on how large the warehouse is – but you will probably find steady state to require somewhere around 10% to 20% of the warehouse size.  Any time we are doing any additional grooming of an alert/event/perf storm…. or changin grooming from 400 days to 300 days – this will require a LOT more transaction log space – so keep that in mind as your databases grow.

Comments (31)

  1. Anonymous says:

    The best practice would be to not leverage autogrow for the OpsMgr DB – for two reasons:

    1.  Performance hit during growths.  This is discussed in the article.  However, for customers who MUST use autogrow, I typically recommend a 500MB DB growth increment, and 100MB on logs.  These settings allow for ample space, and quick, short growths, without too much fragmentation caused my constant smaller growth activities.

    2.  Database explosion.  If something goes VERY badly, such as a poorly written rule or monitor filling the DB with state, alert, event, or performance data suddenly, or grooming failures caused by transaction log issues, the database could grow rapidly and autogrowth enabled will cause the DB to grow to an unmanageable size.  Once you correct the issue, you are left with a DB size potentially in the several hundred gig range.  There is NO simple way to shrink the database back down when this happens, without causing severe fragmentation inside the DB, which cannot be resolve by a simple reindex.  The resolution to recover from this condition requires creating a new DB file, and moving data into that file then removing the original file, which can be somewhat complex.  For this reason, I don't recommend auto-grow, but if you must use autogrow, then I recommend setting a maximum size of the database to something reasonable that you would not be compelled to shrink it, if it grew to that max size.

  2. Anonymous says:

    Kevin

    A couple of questions. I understand about the hit with using autogrow. Presumably using "alter database" tsql will have a similar impact, if the growth setting is quite large? Would it be best practice to increase the size by smaller increments – say in high-tens low hundereds of megabytes, ratther than gigabytes? Would reducing the autogrowth setting from a gigabyte of growth to – say – 100 megabytes of space make sense to prevent a performance impact? I suppose i'm looking for a sweet spot here?

    Thanks

  3. Anonymous says:

    ACS uses a SQL parttion (table) for each day’s worth of new data.  This partition is closed and then indexed every night, table by table.  If everything works perfectly – there is no need to do anything to an ACS DB other than back it up.

    However, it is possible that the indexing can fail on an ACS DB – especially if the table is very large.  If this occurs, a reindex operation on the DB or table would be beneficial for reporting/query performance.

    That being said – this should only be done if the ACS DB insert rate is not very high…. because reindex operations greatly tax the SQL server CPU/Memory/Disk I/O, and will slow down the available insert rate for ACS.

    If you want to see how your partition look today, run a DBCC SHOWCONTIG WITH FAST as described above.

    In most scenarios – a backup is the only maintenance that should be performed on an ACS DB.

  4. Anonymous says:

    Hi,

    i installed ACS and i can’t find any comments about maintenance plan for the ACS DB.

    any idea ?

  5. Anonymous says:

    Jarrad – it has not… however it is pretty much the same advice.

  6. Anonymous says:

    Well Joe, that's certainly one person's opinion.  I bet you make a lot of friends with that approach.  🙂

    Let me give you some historical and technical perspective – that the typical DBA wont know out of the box:

    When our previous product shipped – MOM 2005 – one of the most common scenarios we would see is that many MOM 2005 customers deployed their own SQL servers in the monitoring teams.  There was no DBA involved, and the monitoring teams were not SQL experts.  What we found, is many performance issues caused by the fact that NO maintenance was being done.  There were no index jobs at all set up on a huge percentage of servers.  It made sense to include SQL maintenance rules in the product – to make it totally self-maintaining for that large percentage of customers who deployed the DB's on SQL, but the SQL servers were not managed by an experienced DBA team, familiar with index maintenance.

    Secondly – DBA's often have "standard maintenance" jobs that they run on all their servers/databases.  This doesn't make sense in 100% of cases, because the DBA will not understand all the nuances and schema of the OpsMgr databases.  Standardized DBA maintenance, especially on SQL STD edition (which does not support online indexing) will CAUSE the blocking to occur.  Our maintenance scripts for index operations have the knowledge of which tables should be indexed, which don't, and which will cause issues on inserts during maintenance operations.  It makes MORE sense for the developers of the product to add the maintenance jobs.

    Furthermore – it is VERY common for Data Warehouse products to include their own maintenance.  Applying a standard indexing job to a huge warehouse will result in performance issues. You have to understand the warehouse design, and how the application inserts data, in order to schedule and design maintenance.  The OpsMgr Warehouse have a very specialized maintenance procedure which runs on a constant schedule – keeping track of how fragmented the indexes, and specialized frequencies that keep track of how often (last run time) each job and index operation runs.  This keeps the warheouse performing perfectly with all the knowledge behind the design of the application.  This just makes sense.

    Why on earth, would any DBA – care?  Once they understand this – simply exclude this DB from their index maintenance, understanding that their defacto maintenance does not apply?  It is not like anyone is asking you to do anything "extra".  Just the opposite….

    I look forward to your (hopefully more tactful) response.  🙂

  7. Anonymous says:

    I’ve been getting a few questions lately about slow performance with the console so I thought a post

  8. Anonymous says:

    Derek – that is an excellent question – I am trying to find out the recommendation there.

  9. Anonymous says:

    Thanks Kevin.

  10. Anonymous says:

    My collegue, Kevin Holman, has a blog post with great discussion about SQL maintenance for OpsMgr databases.

  11. Anonymous says:

    @Z –

    PerformanceData_## tables are where all the daily perf data is stored.  We dont reindex this data by design, it is transient in nature and most customers only keep from 2-4 days of this data.  The console views accessing this data on demand are fairly efficient.

  12. Anonymous says:

    Hi Kevin,

    Thanks for the quick reply. this is very helpful.

  13. Anonymous says:

    Great Post, many thanks.

  14. Derek says:

    Kevin,

    With multiple management groups reporting to a single data warehouse is there a chance that the Standard Data Warehouse Data Set maintenance rule will run on each management group and cause issues on the data warehouse?

  15. Michael says:

    Hi, is there a reason to update the statistics before rebuilding the indexes, wouldnt it be a better idea to do it the other way around?

    Kind regards,

    Mc

  16. Joe says:

    You guys are really stupid to reinvent the wheel! This product is a nightmare with all these blocking and deadlock.When you do the "self maintenance". Let the DBAs do their work

  17. Dominique says:

    I will have to agree with Kevin.

    Our DBA team (we have one) is taking care of Enterprise DBs but 90% of the time their responses about issues are it is an Application problem which send back the Monitoring to the Monitoring Team and not the DBA team. Also they do as Kevin said "Standard" Maintenance tasks following their own constraints and don't want to hear any "specifics", they don't want to learn about Applications. So I need to have at least one mind which cover SQL and Application to review the conflicts, the issues, the problems and the constraints within this "mixed" environment.

    Thanks,

    Dom

  18. Sunil says:

    I am not able to see any reports from Reporting, Went through the article " How to  troubleshoot blank reports"But no use.

    Are there any commands to check if the data is written to the DB or not.

    I am able to see the performance reports on the Monitoring console, Each time have to copy to clip board and paste it on to a pdf file.

  19. Z says:

    Hi,

    I was looking at the PerformanceData_## tables and noticed the indices were fragemented to over 98%.  All the other tables were below 15%.  I don't see any error or warnings and we don't have any overrides to stop the maintenace jobs.   So either my timing must be bad every time I check the indices for these tables or something is bad with the maintenance job and I cannot find it.  The more I think about it, the performancedata_## tables probably do nothing but get insert statements, so maybe they don't need reindexing?   And if they don't need reindexing, why have them there.  

    Thoughts?

  20. Jarrad says:

    Hi Kevin,

    Has this been updated for Ops Mgr 2012?

    Thanks

    Jarrad

  21. SQLGeordie says:

    I'm sorry but the latter part of this statement is horrific: "beyond a CHECKDB (only if DBA's mandate it) ". CheckDB should be ran against ALL databases as frequently as the size/system will allow. The question I have for you is, if you're not taking tlog backups and you find that your DB is corrupt, are you (or your business) ok with potentially losing all the data captured up to the point of the last backup? If you're not running CheckDB then at least have some alerts setup – 823/824/825 and even 832 to tell you when you do have corruption.

    "It makes MORE sense for the developers of the product to add the maintenance jobs." – the last thing I'd every do is allow a developer permission to write a maintenance routine. The fact you believe that you cannot defrag an index in standard edition online is one of the reasons I wouldn't allow it. Ever heard of REORGANIZE? What you have to understand is that both this and the online rebuild (Entperprise only) do take a lock on the base table, an explanation can be found at: http://www.brentozar.com/…/index-maintenance-sql-server-rebuild-reorganize. in short: "Even with an online rebuild, a schema modification lock (SCH-M) is needed at the time the fresh new index is put in place. This is an exclusive lock and in highly concurrent environments, getting it can be a big (blocking) problem."

    "The OpsMgr Warehouse have a very specialized maintenance procedure" – can you explain how this is better than highly configurable maintenance routine by Ola Hallengren's (ola.hallengren.com/sql-server-index-and-statistics-maintenance.html) which has won many awards within the SQL Server community?

    What I would suggest is that the scom developers spend less time on maintenance routines and spend more time working on writing tsql which performs well, I'm specifically talking about SDK.Microsoft_SystemCenter_Visualization_Library_GetTopEntitiesByAlerts, ultimately calling SDK.Microsoft_SystemCenter_Visualization_Library_GetCurrentManagedEntityState

    To end this I'd advise you get updated with the latest dmv's within sql server as a lot of what you have mentioned is from SQL 2000 (eg. DBREINDEX, SHOWCONTIG etc etc). Take a look at sys.dm_db_index_physical_stats as an example

  22. Dan says:

    Hi Kevin

    Thanks for the great post.  Does all of this still apply to SCOM 2012 R2?

    Thanks!

  23. Kevin Holman says:

    @Dan – yep – it sure does.  Nothing has changed much since 2007 in this respect of SQL maintenance.  

    @SQLGeordie – you make some great points.  The fact remains the majority of customers out there don't run their SQL like your shop does.  Self-maintenance was critical because we saw a majority of customers deploy dedicated SQL servers for MOM/SCOM, and NO maintenance was performed, which can be catastrophic.  To answer your question about data loss, yes, I covered that in the article.  The assumption for operational data is that a nightly backup is fine for service restoration which is most critical for a monitoring tool, not less-than-one-day's data restoration.  However, all customers can make their own decisions there.  I cannot tell you how many customers I worked with change these settings, then fail to respond to transaction log backup failures, and they caused significant outages because of it.  Your mileage may vary.  🙂

  24. SQLDude says:

    I agree with @SQLGeordie, there is a reason why DBAs do what they do and often what is done in the realms of a DBA is far better than what ‘builtin’ maintenance has to offer. I have even found developer code which was shipped as part of the builtin maintenance which tried to shrink the TempDB in a certain product. That is why DBAs loath to allow this.

    Must also take you task with ‘Most SQL DBA’s will set up some pretty basic default maintenance on all SQL DB’s they support’. I would say that a DBA offers alot more than ‘basic’

  25. Willy says:

    And I would disagree with SQLDude. Not all DBA’s are alike. I have one here that claims to be a previous SCOM Admin, yet shrunk the TempDB to 2GB from 30GB (my initial set of 50% of OpsDB) to free up disk space in my Production environment. Didn’t ask,
    didn’t tell me, just happened to get my SCOM DB report the following day and noticed the change. I was not happy with him… I resized the file, extended the virtual disk, and told him to never do that again.

  26. SQLDude says:

    Then he is not a DBA

  27. SQLDude says:

    I would also suggest that if you are not a DBA then you should not be tampering with TEMPDB sizes and MOST SQL DBA would not allow you to do that either, or get you to submit a ticket to have it increased. I’d be telling YOU never to do it agian 😉

  28. Owais Mohammed says:

    Hi Kevin,

    Thank you for the great post. Has everything i was searching for.

    As a scom admin we are managing our own DB which we are not familiar with and we require post like these to know what needs to be done.

    Thanks again

  29. Naeblis says:

    Why not make the internal maintenance a rule. Then it can be turned off when SQL DBA are maintaining the DBs

  30. Don says:

    Where is the updated article for SCOM 2012? there is no link in the page?