Understanding and modifying Data Warehouse retention and grooming


You will likely find that the default retention in the OpsMgr data warehouse will need to be adjusted for your environment.  I often find customers are reluctant to adjust these – because they don’t know what they want to keep.  So – they assume the defaults are good – and they just keep EVERYTHING. 

This is a bad idea. 

A data warehouse will often be one of the largest databases supported by a company.  Large databases cost money.  They cost money to support.  They are more difficult to maintain.  They cost more to backup in time, tape capacity, network impact, etc.  They take longer to restore in the case of a disaster.  The larger they get, the more they cost in hardware (disk space) to support them.  The larger they get, can impact how long reports take to complete.

For these reasons – you should give STRONG consideration to reducing your warehouse retention to your reporting REQUIREMENTS.  If you don’t have any – MAKE SOME!

Originally – when the product released – you had to directly edit SQL tables to adjust this.  Then – a command line tool was released to adjust these values – making the process easier and safer.  This post is just going to be a walk through of this process to better understand using this tool – and what each dataset actually means.

Here is the link to the command line tool: 

http://blogs.technet.com/momteam/archive/2008/05/14/data-warehouse-data-retention-policy-dwdatarp-exe.aspx

 

Different data types are kept in the Data Warehouse in unique “Datasets”.  Each dataset represents a different data type (events, alerts, performance, etc..) and the aggregation type (raw, hourly, daily)

Not every customer will have exactly the same data sets.  This is because some management packs will add their own dataset – if that MP has something very unique that it will collect – that does not fit into the default “buckets” that already exist.

 

So – first – we need to understand the different datasets available – and what they mean.  All the datasets for an environment are kept in the “Dataset” table in the Warehouse database.

select * from dataset
order by DataSetDefaultName

This will show us the available datasets.  Common datasets are:

Alert data set
Client Monitoring data set
Event data set
Microsoft.Windows.Client.Vista.Dataset.ClientPerf
Microsoft.Windows.Client.Vista.Dataset.DiskFailure
Microsoft.Windows.Client.Vista.Dataset.Memory
Microsoft.Windows.Client.Vista.Dataset.ShellPerf
Performance data set
State data set

Alert, Event, Performance, and State are the most common ones we look at.

 

However – in the warehouse – we also keep different aggregations of some of the datasets – where it makes sense.  The most common datasets that we will aggregate are Performance data, State data, and Client Monitoring data (AEM).  The reason we have raw, hourly, and daily aggregations – is to be able to keep data for longer periods of time – but still have very good performance on running reports.

In MOM 2005 – we used to stick ALL the raw performance data into a single table in the Warehouse.  After a year of data was reached – this meant the perf table would grow to a HUGE size – and running multiple queries against this table would be impossible to complete with acceptable performance.  It also meant grooming this table would take forever, and would be prone to timeouts and failures.

In OpsMgr – now we aggregate this data into hourly and daily aggregations.  These aggregations allow us to “summarize” the performance, or state data, into MUCH smaller table sizes.  This means we can keep data for a MUCH longer period of time than ever before.  We also optimized this by splitting these into multiple tables.  When a table reaches a pre-determined size, or number of records – we will start a new table for inserting.  This allows grooming to be incredibly efficient – because now we can simply drop the old tables when all of the data in a table is older than the grooming retention setting.

 

Ok – that’s the background on aggregations.  To see this information – we will need to look at the StandardDatasetAggregation table.

select * from StandardDatasetAggregation

That table contains all the datasets, and their aggregation settings.  To help make more sense of this -  I will join the dataset and the StandardDatasetAggregation tables in a single query – to only show you what you need to look at:

SELECT DataSetDefaultName,
AggregationTypeId,
MaxDataAgeDays
FROM StandardDatasetAggregation sda
INNER JOIN dataset ds on ds.datasetid = sda.datasetid
ORDER BY DataSetDefaultName

This query will give us the common dataset name, the aggregation type, and the current maximum retention setting.

For the AggregationTypeId:

0 = Raw

20 = Hourly

30 = Daily

Here is my output:

DataSetDefaultName AggregationTypeId MaxDataAgeDays
Alert data set 0 400
Client Monitoring data set 0 30
Client Monitoring data set 30 400
Event data set 0 100
Microsoft.Windows.Client.Vista.Dataset.ClientPerf 0 7
Microsoft.Windows.Client.Vista.Dataset.ClientPerf 30 91
Microsoft.Windows.Client.Vista.Dataset.DiskFailure 0 7
Microsoft.Windows.Client.Vista.Dataset.DiskFailure 30 182
Microsoft.Windows.Client.Vista.Dataset.Memory 0 7
Microsoft.Windows.Client.Vista.Dataset.Memory 30 91
Microsoft.Windows.Client.Vista.Dataset.ShellPerf 0 7
Microsoft.Windows.Client.Vista.Dataset.ShellPerf 30 91
Performance data set 0 10
Performance data set 20 400
Performance data set 30 400
State data set 0 180
State data set 20 400
State data set 30 400

 

You will probably notice – that we only keep 10 days of RAW Performance by default.  Generally – you don’t want to mess with this.  This is simply to keep a short amount of raw data – to build our hourly and daily aggregations from.  All built in performance reports in SCOM run from Hourly, or Daily aggregations by default.

 

Now we are cooking!

Fortunately – there is a command line tool published that will help make changes to these retention periods, and provide more information about how much data we have currently.  This tool is called DWDATARP.EXE.  It is available for download HERE.

This gives us a nice way to view the current settings.  Download this to your tools machine, your RMS, or directly on your warehouse machine.  Run it from a command line.

Run just the tool with no parameters to get help:    

C:\>dwdatarp.exe

To get our current settings – run the tool with ONLY the –s (server\instance) and –d (database) parameters.  This will output the current settings.  However – it does not format well to the screen – so output it to a TXT file and open it:

C:\>dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW > c:\dwoutput.txt

Here is my output (I removed some of the vista/client garbage for brevity)

 

Dataset name Aggregation name Max Age Current Size, Kb
Alert data set Raw data 400 18,560 ( 1%)
Client Monitoring data set Raw data 30 0 ( 0%)
Client Monitoring data set Daily aggregations 400 16 ( 0%)
Configuration dataset Raw data 400 153,016 ( 4%)
Event data set Raw data 100 1,348,168 ( 37%)
Performance data set Raw data 10 467,552 ( 13%)
Performance data set Hourly aggregations 400 1,265,160 ( 35%)
Performance data set Daily aggregations 400 61,176 ( 2%)
State data set Raw data 180 13,024 ( 0%)
State data set Hourly aggregations 400 305,120 ( 8%)
State data set Daily aggregations 400 20,112 ( 1%)

 

Right off the bat – I can see how little data that daily performance actually consumes.  I can see how much data that only 10 days of RAW perf data consume.  I also see a surprising amount of event data consuming space in the database.  Typically – you will see that perf hourly will consume the most space in a warehouse.

 

So – with this information in hand – I can do two things….

  • I can know what is using up most of the space in my warehouse.
  • I can know the Dataset name, and Aggregation name… to input to the command line tool to adjust it!

 

Now – on to the retention adjustments.

 

First thing – I will need to gather my Reporting service level agreement from management.  This is my requirement for how long I need to keep data for reports.  I also need to know “what kind” of reports they want to be able to run for this period.

From this discussion with management – we determined:

  • We require detailed performance reports for 90 days (hourly aggregations)
  • We require less detailed performance reports (daily aggregations) for 1 year for trending and capacity planning.
  • We want to keep a record of all ALERTS for 6 months.
  • We don’t use any event reports, so we can reduce this retention from 100 days to 30 days.
  • We don’t use AEM (Client Monitoring Dataset) so we will leave this unchanged.
  • We don’t report on state changes much (if any) so we will set all of these to 90 days.

Now I will use the DWDATARP.EXE tool – to adjust these values based on my company reporting SLA:

dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "Performance data set" -a "Hourly aggregations" -m 90

dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "Performance data set" -a "Daily aggregations" -m 365

dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "Alert data set" -a "Raw data" -m 180

dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "Event data set" -a "Raw Data" -m 30

dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "State data set" -a "Raw data" -m 90

dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "State data set" -a "Hourly aggregations" -m 90

dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "State data set" -a "Daily aggregations" -m 90

 

Now my table reflects my reporting SLA – and my actual space needed in the warehouse will be much reduced in the long term:

 

Dataset name Aggregation name Max Age Current Size, Kb
Alert data set Raw data 180 18,560 ( 1%)
Client Monitoring data set Raw data 30 0 ( 0%)
Client Monitoring data set Daily aggregations 400 16 ( 0%)
Configuration dataset Raw data 400 152,944 ( 4%)
Event data set Raw data 30 1,348,552 ( 37%)
Performance data set Raw data 10 468,960 ( 13%)
Performance data set Hourly aggregations 90 1,265,992 ( 35%)
Performance data set Daily aggregations 365 61,176 ( 2%)
State data set Raw data 90 13,024 ( 0%)
State data set Hourly aggregations 90 305,120 ( 8%)
State data set Daily aggregations 90 20,112 ( 1%)

 

Here are some general rules of thumb (might be different if your environment is unique)

  • Only keep the maximum retention of data in the warehouse per your reporting requirements.
  • Do not modify the performance RAW dataset.
  • Most performance reports are run against Perf Hourly data for detail performance throughout the day.  For reports that span long periods of time (weeks/months) you should generally use Daily aggregation.
  • Daily aggregations should generally be kept for the same retention as hourly – or longer.
  • Hourly datasets use up much more space than daily aggregations.
  • Most people don’t use events in reports – and these can often be groomed much sooner than the default of 100 days.
  • Most people don’t do a lot of state reporting beyond 30 days, and these can be groomed much sooner as well if desired.
  • Don’t modify a setting if you don’t use it.  There is no need.
  • The Configuration dataset generally should not be modified.  This keeps data about objects to report on, in the warehouse.  It should be set to at LEAST the longest of any perf, alert, event, or state datasets that you use for reporting.
Comments (76)

  1. Anonymous says:

    Hi Kevin.. is there any documentation which throws light on how data aggregation happens.. what algorithm exactly it follows in different aggregation type (raw, hourly, daily)?

  2. Anonymous says:

    @JohnDoe321 –

    The DW maintenance operations will begin grooming anything older than retention dates.  Some will drop quickly, some will take some time (days).  This will manifest itself as free sapce in the existing database file.

    You should NEVER EVER EVER "shrink" a database file that is hosting a SCOM database.  This will cause MAJOR performance issues.  Forcing a SQL shrink operation causes the DB file to shink in size and recover free space in the database, reducing the file size and freeing up space on disk.  While this sounds good – it is BAD.  It causes fregmentation of the DB that cannot be resolved by a simple reindex operation.

    Again – NEVER shrink an OpsMgr database.  Whatever size it has grown to – leave it that size.  Free space in a DB file is a GOOD thing – backups only back up used space – this doesnt hurt anything.  If you MUST recover the used DB file space, then get with a SQL expert who understand how to create a new file on a new disk and move ALL the data from one DB file to a new DB file, to keep the DB from fragmenting.

  3. Anonymous says:

    @John –

    1.  I don't see how number of agents is relative?  This isn't a sizing conversation as much as it is a retention conversation based on retention SLA's.

    2.  The article above was written for OpsMgr 2007.  Network devices wasn't really lit up until OpsMgr 2012.  This article is based on datasets.  Such as alert, performance, events, availability.  Network devices store data in those datasets just like Windows Computers or Unix/Linux machines.  So I am not sure how that would be relevant?

  4. Anonymous says:

    I have set the dw parameters following your instructions. so will the dw db automatically shrink itself down now? or do I need to shrink it and make it smaller?

  5. Anonymous says:

    Eric – when this is really big – it generally means that you have, or had at one time, a MP that was filling the configuration dataset with discovered instance change information.  The older Dell and IBM MP's were very bad about this, as are the Microsoft SQL, AD, DNS, etc…

    If you want to see what is using all the space… you can run the following query against the warehouse DB:

    select distinct top 50 mep.ManagedEntityRowId, me.FullName, Count(mep.managedEntityPropertyRowId) as 'Total'

    from ManagedEntityProperty mep WITH(NOLOCK)

    LEFT JOIN ManagedEntity me WITH(NOLOCK) on mep.ManagedEntityRowId = me.ManagedEntityRowId

    group by mep.ManagedEntityRowId,me.FullName

    order by Total desc

    1. Bryan says:

      How do you remove the data that is no longer relevant?

  6. Anonymous says:

    @Oscar –

    Why do you feel you need to modify it?

  7. Anonymous says:

    Ouch… Paul are you on R2 or SP1?

    You might be impacted by a lack of grooming due to a future event issue…   You might have to find these in the DB and modify them manually… (very carefully) or open a PSS support case for this issue.

  8. Anonymous says:

    @Prashant –

    1.  Yes you can revert back whenever you wish

    2.  Once you change retention, the data will be groomed over time, at the next dataset maintenance operation that runs for that dataset.  You cannot say exactly how long it will take to complete grooming of your adjusted dataset, it depends on many factors.  Normally, if data in a dataset is large and in multiple tables, those tables will be dropped quickly, and you should be able to check for this by running the command line tool and look at %use or quesry/report on the DB itself directly.

  9. Kevin Holman says:

    @Praveen – Your grooming will occur during standarddatasetmaintenance. There are limits to how much will groom out per job, and this varies per dataset you changed from 400 to 90. So it will start "immediately" however it might take 24 hours to recognize
    the bulk of the grooming, or even longer if you have much to groom.

  10. Kevin Holman says:

    @Mark –

    Because of the way we store data in tables, and drop entire tables for grooming, it "should" be ok to do it all at once. You might see some data warehouse timeouts on the maintenance, but keep an eye on it. You might have to modify the registry setting to allow
    a longer timeout to let it finish, as documented:
    http://blogs.technet.com/b/kevinholman/archive/2014/06/25/tweaking-scom-2012-management-servers-for-large-environments.aspx

  11. Anonymous says:

    How many event tables do you have present?

    When there is more than one table – we groom by waiting until ALL the data in the table is older than the grooming retention, and then dropping the table.

    Whee there is only one table (smaller agent counts or well tuned event collection rules) then we use something like "delete from where date > retention" which is FAR less efficient.

    Just looking at your oldest and newest events doesnt tell the story.  It could be totally normal.

    There is no warehouse upgrade from MOM 2005.  Since you have events inserted before SCOM even existed, and future events from 2036 – I would say you have time synch maintenance issues in your evnironment – and this will wreak havoc on all kinds of applications.  You need to find out the source of your time synch issues and get those resolved.  I would not worry about your event dataset…. set it to something very short, like 30 days or so – and give it time to groom out (assuming a single event_(guid) table).

    Only worry about the event dataset when it is the largest dataset in your warehouse – and then dont assume grooming is broken – focus on reducing the retention, reducing the event collection rules by tuning (another blog post of mine).  There were some issues with grooming events in the warehouse if I recall correctly – when future dates were inserted.  I am thinking this was fixed in SP1 or R2 – cannot remember – you should research that.

  12. Anonymous says:

    Thx Kevin. The explanations and examples are very clear.

    Much appreciated,

    John Bradshaw

  13. Anonymous says:

    thank you for you reply

    i look at blogs.technet.com/…/what-is-config-churn.aspx to evaluated my config churn and i come back if i have another problem.

    for information, i am R2 scom with actually 1000 agents

  14. Anonymous says:

    @Kelly –

    I'd run the top tables query posted on my SQL blog, or in SQL studio – run the report.  If alertstage is your biggest – I'd wonder if you have a maintenance issue going on…. alerts are synchronized from the opsDB to the staging table in the warehouse, where subsequent jobs manage the data (new and changes) and insert it into alert tables.  I'd say you are experiencing some issues with getting WAY too many alerts (from some sort of flip-flopping monitors).  I'd query the alertstaging tables and see if you can find similarities and focus on the top offending item.  I'd also look at the events on all management servers for warehouse specific events to see if there are issues that are know to the MS.  

  15. Anonymous says:

    Possibly.

    For the Hourly and Daily – that is fine – if you never need to report on longer than 90 days.

    We dont recommend modifying the raw data from 10 days to 90 days.  For two reasons:  First – raw data is HUGE, and no built in reports access it.  Second – it could potentially affect the aggregation jobs if it gets too large.

    Normally – my recommendation is to leave raw alone at 10 days – then set the hourly and daily based on your business requirements – understanding that you can keep daily data for a very long time with minimal storage requirements.

  16. Anonymous says:

    The configuration dataset contains all the discovered properties of all your managed entities, groups, etc.

    This can be quite large, especially if you are not on OpsMgr 2007 R2 – as there was a bug where we dont groom the config data in SP1 which was fixed in R2.  It can also be large, even in R2 – if you have a lot of poorly written MP's which have discoveries that change frequently – the warehouse keeps a record of the changes necessary for historical reports.  This is another reason config churn is such a bad thing.  The data will mostly be contained in the ManagedEntityProperty table.

    I would just live with the size issue, and work to ensure you are fixing any config churn issues in your environment.  

    Are you SP1 or R2?  Have you evaluated your config churn?

  17. Anonymous says:

    Kevin,

    Hoping you can help us with our constantly growing OpsMgrDW db.  Consuming about 1GB per day.  Installation has been in place for a year.

    Used your article regarding dwdatarp and found that performance data set was killing us.  Adjusted raw data retention to 1 day, currently using 370 GB.  Adjusted hourly aggregations to 2 days, using 255 GB.  Adjusted daily aggregations to 30 days, using 26 GB.

    Not content with having that much raw data and not using that data currently, wanted to trim that back.  I ran the following I found in your common sql queries article.  (I hope this was the correct step to take next?)

    select top 20 pcv.ObjectName, pcv.CounterName, count (pcv.countername) as Total

    from performancedataallview as pdv, performancecounterview as pcv

    where (pdv.performancesourceinternalid = pcv.performancesourceinternalid)

    group by pcv.objectname, pcv.countername

    order by count (pcv.countername) desc

    Found that my top 3 counters were:

    Memory Pages/sec = 2687369

    Health Service agent processor utilization = 2508399

    Network Interface Bytes Total/sec = 2164085

    Decided we did not care about "Memory Pages/sec" perf data for our environment.  So I disabled Memory Pages Per Second for all objects of class Windows Operating System.  I have waited a couple weeks now, and when I run the query above, Memory Pages/sec is still the top counter listed.

    Am I investigating the top offenders for large Raw Performance data set correctly to reduce that?  Am I not turning off the counter correctly?

    MUCH APPRECIATED, your articles are phenomenal.

    Brian

  18. Anonymous says:

    @Paul –

    This is unfortunately normal, especially for small databases like yours.

    The Event collection rules out of the box are quite noisy, and provide very little value.  Also – using old converted MP;s like SMS, SCCM, and the old Exchange 2007 cause even more bloat.

    I recommend setting the event retention in the warehouse to 30 days max – as these provide very little value.  Then – I recommend tuning the worst offendors of event rules:  blogs.technet.com/…/tuning-tip-turning-off-some-over-collection-of-events.aspx

  19. Anonymous says:

    @Karl –

    What is the problem?  It is totally normal for the Perf Hourly tables to consume the largest percentage of a warheouse.  32GB of perf hourly data is not really all that much.  As to tuning this – you should look at this in two dimensions:

    1.  Retention – set the retention to what is in your SLA for reporting.  You are at the default – which is 400 days.

    2.  Tuning.  Run the performance counter queries at blogs.technet.com/…/useful-operations-manager-2007-sql-queries.aspx and determin which ones you dont need – find those rules that collect them – and disable them.  Collecting perf data for all objects – when it is a counter you dont care about – will just fill your warehouse with noise.

  20. Anonymous says:

    A bit more information on issue. It seems to be AlertStage that is taking up all the space.

  21. Sameer says:

    We have changed all the settings to MaxAge of 90 days, including the Raw Performance Data Set..  

    is this a bad idea?

  22. Bryce says:

    Great article.  Just an FYI to anyone out there, I got an exception error that my remote connection may not be enabled when I first tried to run the tool.  

    I removed the default instance name from the command line, tool ran fine.  

    dwdatarp.exe -s mySQLServerName -d OperationsManagerDW > c:dwoutput.txt

  23. Bryce says:

    FYI – I was running the tool locally on my SQL server.  

  24. Eric says:

    Hello Kevin,

    the dwdatarp.exe tool return me a configuration dataset to 46% (the current size is 103,508,496 kb)

    what are the data in the configuration dataset ?

    Thanks

  25. Oscar says:

    How Can I change the value set for configuration dataset iusing query?

    Thanks

  26. Oscar says:

    Hi Kevin,

    I't s requirement from a client, they have another SCOM MG and want to replicate the same configuration that they have in the other MG.  I was trying to use the DWDATARP.EXE tool, but it fails, so I made the changes with some queries, just I need to modify the last one configuration dataset.

    Thanks

    I use SCOM 2008 R2 on Windows 2008 R2 and SQL 2008 SP1

  27. Oscar says:

    Sorry is SCOM 2007 R2 on Windows 2008 R2 and SQL 2008 SP1

    every time when I try to use the DWDATARP.EXE tool,  I got this.

    EXCEPTION : System.Data.SqlClient.SqlException: Cannot open database "Operations

    ManagerDW – ds "Client Monitoring data set" -a "Daily aggregations" -m 120" requ

    Login failed for user 'RESTusr-ogarza'.

      at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception

    , Boolean breakConnection)

    And the account that i'm using has Administrative righhts and SA rights in SQL, so I don't know why is not working fine the tool.

    Thanks.

  28. karl says:

    kevin

    that dwdatarp.exe tool is giving me this for my hourly performance data set.

    Performance data set           Hourly aggregations      400    32,945,816 ( 70%)

    is there a way for me to find whats causing it?

    thanks

  29. Ted says:

    Thanks for the info.  As always useful and timely.

    How can I tell how long the grooming procedure for a particular dataset is taking?  Also, how can I tell if it is not completing within the configured repeat time?  It seems like a bad thing to have one grooming procedure still running when another one either starts or is supposed to start.  It seems like in that case, the number of records per run is too big.

    Is there a log of when the grooming procedures run and how much they do?

    Thanks,

    Ted

  30. Paul says:

    Kevin-

    Are DW database is about 160 GB. Looking at the information the dwdatarp.exe tool gives us, the Event data set is 80%.

    Event data set                 Raw data                  30    93,663,928 ( 80%)

    Is this normal in large DW databases? If not what could we be donig wrong.

    Thanks!

    Paul

  31. Paul says:

    Thanks Kevin. This helped alot.

    I think we have another issue. The grooming for the Event data set is not running. After reading many articles on how the grooming process works, I found an article to manually run the grooming process.

    I am running the following query:

    DECLARE @DataSet uniqueidentifier

    SET @DataSet = (SELECT DatasetId FROM StandardDataset WHERE SchemaName = 'event')

    EXEC StandardDatasetMaintenance @DataSet

    The query completes successfully but nothing happens. I receive the following output:

    Caution: Changing any part of an object name could break scripts and stored procedures.

    Cannot grant, deny, or revoke permissions to sa, dbo, entity owner, information_schema, sys, or yourself.

    Cannot grant, deny, or revoke permissions to sa, dbo, entity owner, information_schema, sys, or yourself.

    Cannot grant, deny, or revoke permissions to sa, dbo, entity owner, information_schema, sys, or yourself.

    My oldest event is from 2005-04-05 13:07:22.000 and my newest is 2036-03-12 11:11:50.000.

    I believe this was an upgrade from MOM but that was before my time.

    Any Ideas?

    Thanks,

    Paul

  32. Paul says:

    There are 243 Event_ tables.

    My event data set by far is the largest data set.

    Dataset name Aggregation name Max Age    Current Size, Kb

    —————————— ——————– ——- ——————–

    Alert data set Raw data                     180      381,560 (  0%)

    Client Monitoring data set Raw data 30            0 (  0%)

    Client Monitoring data set Daily aggregations 100           16 (  0%)

    Configuration dataset Raw data 400    8,002,632 (  7%)

    Event data set Raw data 30   95,394,944 ( 80%)

    Performance data set Raw data 10    2,175,408 (  2%)

    Performance data set Hourly aggregations 60    9,939,264 (  8%)

    Performance data set Daily aggregations 365    3,124,392 (  3%)

    State data set Raw data 60       76,216 (  0%)

    State data set Hourly aggregations 60      444,848 (  0%)

    State data set Daily aggregations 60       63,160 (  0%)

    I will look into the grooming events issue with future dates.

    Thanks for your quick response.

    Paul

  33. Paul says:

    You were spot on. There is a table 'StandardDatasetTableMap' in the DW that has a column 'EndDateTime'. The grooming stored procedure looks at this column and any date > then your retention date will be skipped. I am not sure how the DW partitions the tables but it looks like there are only 40 'active' tables in the StandardDatasetTableMap table. This caused us to have 196 Event.Event_"GUID" tables with no correlating Event.EventDetails_, Event.EventParameter_ and Event.EventRule_ tables. To resolve this we dropped all Event.Event_ tables with no other correlating Event tables. Then we deleted all rows with future events, updated the EndDateTime with the MAX(datetime) from the active event tables, kicked off the grooming procedure and our DW database went from 180 GB to 30 GB. I just wanted to thank you for all your help.

    Thanks,

    Paul

  34. Andrew says:

    I have been advised that you can groom out data for a specific Management Pack only.  The guy who told me this "can't remember how" but said that he did it 🙂 Is this possible, and if so, are there any examples on how to target data from certain packs when grooming?

    Thanks

  35. Prashant Garad says:

    Hello Kevin,

    Thanks a lot for such detail descritption…. But I have 2 queries on it:

    1. Can we revert back the retention period if we change for any of the dataset (Like event)?

    2. Once we changed the retention period when data will be groomed.

    Basically I am planning to change the retention period for Event Dataset and once the data groomed, change back to its original setting. So is it possibel?

  36. Prashant Garad says:

    Thanks a lot for your quick response on it….

    It would be very greatful if you provide/suggest the command line tool.

  37. Karthick kesavan says:

    Hi kevin,

    im a newbie if i wanna know indepth of reporting and TSQL from where i have to start pls help me!!!! thanks i n advance

  38.  I've run into a rather odd issue. We've building out a new SCOM 2012 sp1 system and all was fine until we added around 400 new clients in a week. Were now at 715 total. The next week our Data warehouse size exploded. The strange part about it is the space is being used up by the Configuration Dataset. Its currently taking 85% of the space at nearly 300g. Any idea on what might be causing this or if we can tame it down?

  39. John Nurden says:

    Great article Kevin. A couple of things from me…

    1) how many clients were you basing your sizes on?

    2) Is this purely for servers? i.e. any network devices included? If not, how would you expect the network devices to effect the overall sizing required?

  40. John Nurden says:

    Kevin, thanks for the very quick reply. Sorry, I have probably taken this article out of context with my questions. Basically I have followed the MS sizing tool and worked out the overall total but its a very general vague total. What I am far more interested in is real world examples of DBs that are groomed and the sizes of them dependant upon whats being monitored, datasets used, number of devices you are collecting data from etc. You published a table which shows actual real world values of current sizes. I was curious as to how many agents have collected that data in your environment.

    My second question was more a hopeful punt as to if you had experience of the kind of increase in data size that would occur if you were also monitoring x amounts of network devices.

  41. Cameron says:

    Thanks Kevin! Got my DW at a reasonable level to continue my deployment of OpsMgr Clients. Also, the help you gave me @ the Hyper-V training was tremendous. Once I ran Remove-SCOMDisabledClassInstance all my print server client issues went away!

    Hope to see you at other MS Events!

  42. Chris K says:

    How soon after making changes to the data sets do the DB sizes actually shrink?  Is there a DW Maintenance job in SCOM that runs on a schedule that needs to happen first before you see any changes?

  43. Praveen says:

    Thanks Kevin. Great Article. The problem I am facing is the DW was about to get full. So I created a .ndf file in a different drive. But did not stop autogrowth in the original .mdf file. Now the drive space where the .mdf file lies has 0MB free space
    left. I have stopped the ‘Auto Growth’ now. Also set the retention policy for Dataset that was set to 400 to 90 days. My question is: now that I have reduced the dataset retention days, will DW delete the records which are older than 90 days? Regards, Praveen

  44. Anonymous says:

    J’ai eu un problème avec l’espace disque qui contient la base de données OperationsManagerDW. En effet

  45. Samuel Dubrul says:

    Based upon the queries you have provided, I created a small DatawareHouse Query Report MP that lists some important values.
    http://dynamicdatacenter.wordpress.com/2014/02/14/scom-data-warehouse-query-reports-management-pack/

  46. Amy says:

    Hi Kevin, my client want to store raw performance data for one year, no matter in DB ops or DB warehouse, so that there would have one year raw data showing on dashboard and report. Is that possible? Would there be any limitation and retrictions?

  47. Techno says:

    Kevin I must Thank you for such really nice blog. you covered pretty much everything in SQL. I have been doing some research on SCOM DW database grooming. I have configured the grooming setting for different datasets retentions. I am looking for a query
    to check when did grooming run last time and what was the status.

    I run the query select * from maintenancesetting and I get 3 value. Instance optimization, instance grooming and Config optimization time. Could you please throw some lights what each of this type means and which one I should look for to check for grooming
    of datasets.

    Thanks for your help again !!

  48. Tom says:

    @Techno

    Use OperationsManagerDW;
    With AggregationInfo As (Select AggregationType = Case
    When AggregationTypeId = 0 Then ‘Raw’
    When AggregationTypeId = 20 Then ‘Hourly’
    When AggregationTypeId = 30 Then ‘Daily’
    Else NULL End,AggregationTypeId,MIN(AggregationDateTime) As ‘TimeUTC_NextToAggregate’
    ,SUM(Cast (DirtyInd As Int)) As ‘Count_OutstandingAggregations’
    ,DatasetId From StandardDatasetAggregationHistory
    Group By DatasetId, AggregationTypeId)
    Select
    SDS.SchemaName
    ,AI.AggregationType
    ,AI.TimeUTC_NextToAggregate
    ,Count_OutstandingAggregations,SDA.MaxDataAgeDays
    ,SDA.LastGroomingDateTime
    ,SDS.DebugLevel
    ,AI.DataSetId
    From StandardDataSet As SDS WITH(NOLOCK)
    Join AggregationInfo As AI WITH(NOLOCK) On SDS.DatasetId = AI.DatasetId
    Join dbo.StandardDatasetAggregation As SDA WITH(NOLOCK) On SDA.DatasetId = SDS.DatasetId And SDA.AggregationTypeID = AI.AggregationTypeID Order By SchemaName Desc

  49. Mark Ryan says:

    Kevin,..
    is it ok to reduce it down from 400 to 90 days in one step if you dealing with a lot of data? or should it be done in increments? 370 days… 300 days.. etc. so not to overwhelm the grooming procedure?

  50. Sunil_03 says:

    Hi Kevin

    We have changed the grooming settings, now what should be done to free up the space which was occupied . Can we go ahead and run the shrink on the SCOM DW database keeping the free space as 40% on DW (recommended).

    Is it recommended.

    Regards
    Sunil

  51. Morgan says:

    I’ve referenced a lot of your stuff and it has helped but I’m stuck now. My State Hourly and State Daily are backed up. the last time they ran successfully was 12-22. Hourly has 406 outstanding aggregations and Daily has 18. Is there a way to force it
    up to date? I thought I had read a way to reset it but I can’t find it anymore. Any ideas? FYI: I had the Hourly set at 365 but set it back because it was huge and we aren’t using the data. Thanks.

  52. Kevin Holman says:

    @Morgan –

    I assume you figured out what monitor went crazy on 12/22 and resolved it?

    There is no supported way to just "reset" that I am aware of.

    Why cant you just use the solution I provide here:
    http://blogs.technet.com/b/kevinholman/archive/2010/08/30/the-31552-event-or-why-is-my-data-warehouse-server-consuming-so-much-cpu.aspx

  53. Morgan says:

    Kevin,

    I tried that but the maintenance script ran and ran and ran………and ran. 22 hours later it was still running. So I killed it. At this point every hour sees an increase in the Outstanding Aggregations. I’m still new to this.

  54. Kevin Holman says:

    @Morgan –

    Why did you kill it?  The fact that it was still running shows it is working.
     
    It might take DAYS to complete…. especially if your SQL server is undersized or if you wrote a REALLY bad monitor that flooded you with state changes. You must process these. Why not just let it run. Once you get past the big flood of data, the rest go really fast.

    I'll repeat my other question: did you figure out what monitor went crazy on 12/22 and resolve it?

  55. Morgan says:

    Didn’t think 22 hours was correct. We had a couple of monitors that I have since decided were poisoning our system. One being Cisco UCS. I can start it again and see. I wasn’t expecting it to run so long. Is there any way to monitor the progress? A query?
    Thanks again.

  56. Morgan says:

    Okay, 2 days 17 hours 36 minutes later and it is still running…… Outstanding hourly aggregations increasing.

  57. Morgan says:

    3 days 16 hours and 57 minutes……and counting

  58. Kevin Holman says:

    At some point you might consider opening a support case with Microsoft. You might have just flooded too much data, or your sql server might not have enough IOPS to churn through this. But if you stop it, you gain nothing.

  59. Morgan says:

    Thanks Kevin. I doubt it is the SQL server. It is pretty hefty. I’m calling MS today. Its causing issues with alerts at this point and that is may main concern right now. I would suspect that if the maintenance was actually working I would see some of
    the older dates being removed and I have not.

  60. Morgan says:

    Interestingly enough, when I canceled the maintenance and went to close the query I was asked if I wanted to commit the transactions which I did. After that I checked the outstanding aggregations again and it had gone down by 111. So it was working….slowly.
    I have restarted the maintenance again and will let it go for a while more.

  61. Kevin Holman says:

    No – you will NOT see any older dates removed – that is normal and by design. Each run of maintenance handles ONE hourly transaction. The reality is you had a flood of state changes so bad in a single hour (or group of hours – or a day) that processing
    this hour is killing the maintenance. All this time it is working on a single hour of statechanges. Until it finishes calculating that hour – you wil not see any improvement in the numbers. This is how it works. Once you process through the flood time, the
    subsequent maintenance runs will run quickly, still only processing a single hour at a time, but you can run it in a loop and catch up quickly.

  62. Kevin Holman says:

    What command are you running in SQL exactly?

  63. Morgan says:

    This is the loop I’m using to run maint.

    WHILE (
    (SELECT COUNT(AggregationDateTime)
    FROM StandardDatasetAggregationHistory AS ah WITH(NOLOCK)
    INNER JOIN StandardDataSet AS ds WITH(NOLOCK) ON ah.DatasetId = ds.DatasetId
    WHERE ds.SchemaName = @DataSetName
    AND LastAggregationDurationSeconds IS NULL) > 1
    )
    –PRINT N’Looping.’

    BEGIN
    BEGIN TRANSACTION;
    USE [OperationsManagerDW]
    DECLARE @DataSet uniqueidentifier
    SET @DataSet = (SELECT DatasetId FROM StandardDataset WHERE SchemaName = @DataSetName)

    EXEC StandardDatasetMaintenance @DataSet
    COMMIT TRANSACTION;
    END

    And this I’m running to see all of the outstanding aggregations by date and time. This is how I could tell some of them had processed.

    DECLARE @DatasetId uniqueidentifier
    SELECT
    @DatasetId = DatasetId
    FROM Dataset d
    WHERE (d.DatasetDefaultName = ‘State data set’)
    Select AggregationDateTime, AggregationTypeId
    From StandardDatasetAggregationHistory
    Where DatasetId = @DatasetId
    And
    DirtyInd = 1
    Order by AggregationDateTime

  64. Morgan says:

    This belongs to the top of the loop but I missed copying it.

    USE OperationsManagerDW
    DECLARE @DataSetName varchar(50)
    — Set this string to the "SchemaName" value that is behind
    SET @DataSetName = ‘State’

  65. Kevin Holman says:

    DONT use a loop when getting started with this – that is a terrible idea! You didn't tell me you were doing a loop.

    Loops are for when you are basically through the mess – and you want to catch up the quick stuff quickly. You need to run this manually at first – one at a time – to understand how long a SINGLE maintenance run takes. Then – when these runs complete rather quickly you can kick off a loop. This is why yours was never ending. Details like this are kind of important to share. 🙂

    That said – now that you are 111 runs in – you can go back to running the loop.  However – with ZERO understanding how long maintenance actually is taking now – you still haven't learned anything.  Like – is the runaway monitor still present?  What are the most common noisy monitors?  You haven't done any of the necessary research on that yet, so it is likely you might find yourself back in the same problem soon. 

  66. Morgan says:

    Cisco UCS and Citrix were the making the most noise. I was able to uninstall the Cisco because we have other monitors in place but not the Citrix yet. It will be going away soon so I’ll just have to stay on top of it for now.

  67. Morgan says:

    Kevin,

    The query finally finished. Today. A month later! Actually longer that that because I killed it a few times before the last restart. Patience is a virtue? So…..anyone else reading this: Run it. Keep an eye on it and wait. What a mess I had but the errors
    in my log appears to have gone away. Thanks for your help.

  68. Morgan says:

    One last question (I hope). I have one remaining error:Creation of module with CLSID "{C6410789-C1BB-4AF1-B818-D01A5367781D}" failed with error "Illegal operation attempted on a registry key that has been marked for deletion." in rule "CISCO.UCS.R2.f5309e9f6228464d8f1ceea6390e120e.PowerSupplyUnit.General.UnitMonitor"
    running for instance "PSU 1 (Chassis)" with id:"{2699EA96-43EE-8906-A507-5DCFC6125A9D}" in management group "All Computers".

    I know what it is from (Stupid Cisco UCS). Which is what was flapping to begin with. But I uninstalled the MP and it looks like something stuck.

  69. Hi Kevin,
    why comes that: "Do not modify the performance RAW dataset."?

    Got a customer that requires 2y raw data… I know, big DWH. But besides of that, there should be no constraints, right?

    Thank you,
    Patrick

  70. Kevin Holman says:

    @Patrick –

    More just a general recommendation. Most I have seen is modifying to 100 days. Only thing you’d have to keep an eye on, is their aggregations. I don’t know if our queries are optimized to sift through that much data, so porting it into aggregation jobs might
    take longer or time out…. but I don’t know of any problems or scenarios where that happened. I cannot imagine trying to actually query that much raw data. I doubt it would be terribly successful. 🙂

  71. Tabish says:

    In my case:-

    After changing the retention period of hourly performance data set from 400 to 100 days, how long will it take to groom and free up space from DW ???

    1. Kevin Holman says:

      @Tabish –

      You should see some performance table cleanup within a few hours – it is based on an algorithm for datasetmaintenance. It could be as soon as a few minutes, but within 24 hours you should see significant impact IF you have a large number of perfhourly tables, AND all of the data in each table is older than the retention setting. Running an EXEC sp_updatestats never hurts either after making a big change.

  72. Sergio says:

    Hi, I am trying to run the tool and I am always getting empty results back (just the headers). Any idea what might be causing it?

    1. Kevin Holman says:

      Most likely – you don’t have enough rights in SQL. Get an SA to run it.