Troubleshooting MOM 2005 DTS Issues and Database Full Issues

An issue that can occur with the Onepoint database is that the database begins to grow rapidly for no apparent reason, and may suffer a performance hit or stop functioning when the disk is near-full. This can occur for several reasons:

1. You just installed a lot of new management packs. In this case, a lot of new rules are collecting a lot of new Event or Performance counter data and we are seeing a natural increase in the database size as a result. You can mitigate this by disabling any rules or rule groups in the MOM 2005 Administrator console that you don't need.

Important: Make sure you have installed the latest and greatest management packs from the MOM web site! They often contain fixes. Also, search the Knowledge Base for KB articles about the management pack; some fixes may be available through Microsoft Support that have not yet been published in a management pack update.

2. You created custom rules that may be improperly configured and are "noisy." You can check for the Top 10 most common Events or Alerts using SQL scripts that are supplied in the MOM 2005 Resource kit, downloadable from https://www.microsoft.com/mom/downloads/2005/reskit/default.mspx

The resource kit installs by default to \Program Files\Microsoft Operations Manager Resource Kit and the SQL acripts are in the "Scripts" sub-folder. Simply copy the text of the MostCommonAlerts or MostCommonEvents files into SQL Query Analyzer to see the list.

3. And now, the Big Kahuna of reasons the Onepoint DB gets bigger: The MOM Reporting DTS job fails to run.

When you install MOM 2005 Reporting, a scheduled task called SystemCenterDTSPackageTask is created on the Reporting server to run a Data Transfer Services (DTS) job nightly at 1 AM. This copies any new data that has been written to the Onepoint database since the last DTS job ran to the SystemCenterReporting database (referred to as the Data Warehouse) on the MOM Reporting server. Once this data has been copied, MOM is able to groom out this data at its next grooming interval.

So how does MOM know to groom the data? If the DTS job runs successfully it updates the TimeDTSLastRan value in the ReportingSettings table in the Onepoint database with the time of completion (displayed as GMT). When the MOM 2005 grooming job runs, it uses a stored procedure which first looks at the grooming value stored in GroomingSettings.dataRetentionDays. This is the value configured in Global Settings > Database Grooming > "Groom data older than...". It then looks for a value in ReportingSettings. timeDTSLastRan and compares them. If the TimeDTSLastRan value is older than the grooming dataRetentionDays value, grooming stops to prevent data loss. And the Onepoint database grows. And grows...

The maximum supported size of the Onepoint database is 30 GB. This must hold both the database file EeaData.mdf and the Transaction log file, EeaLog.ldf. A rule of thumb is that you should leave 25 to 40 percent free space in the database to make sure the transaction log has enough space.

Bottom line: If the DTS job is failing, the Onepoint database will eventually fill and stop functioning because it is out of transaction log space.

So how to fix this?

All is revealed in the newly updated article

899158 MOM 2005 OnePoint database increases and Reports are empty or incomplete in MOM 2005 Reporting.
https://support.microsoft.com/default.aspx?scid=kb;EN-US;899158

(Soon to be renamed "How to Troubleshoot DTS and Database Sizing Issues in MOM 2005 Reporting") Which includes the same troubleshooting methods our engineers are using when working on these issues.

There is a wealth of information in this article on this and other issues with the OnePoint and SystemCenterReporting databases and how to fix them. For this issue the series of tasks is something like this:

1. Fix the problem that is causing the DTS job to fail. This may involve resizing the databases, moving data in smaller chunks in the DTS job using the /latency switch, and upping the remote query timeout values for the DTS job on both servers, or any combination of the above. It could also be as simple as someone having disabled the SystemCenterDTSPackageTask scheduled task!

2. Once the DTS job succeeds, run the MOMX Partitioning and Grooming sql job manually to delete the data from the Onepoint database that has been now transferred to the reporting database. This job is not transactional, and the transaction log should have rolled back to its original size after the DTS job failure, so this should run without a problem.

Once MOM is able to groom the database and the DTS job is running successfully, you can cofigure a MOM alert to notify you if the DTS job fails. There is an event rule "MOM Reporting DTS Job failed to complete successfully" with criteria event ID = 1001, source = MOM.Datawarehousing.DTSPackageGenerator.exe. This will create an Error alert if the DTS fails.

Regards,

Jarrett Renshaw
Manageability Knowledge Engineer