FCS with MOM 2005 Database Guidance

 

All information provided here is in regards to Forefront Client Security 1.0. There are no guarantees on the information provided.



Default DB sizes during FCS Setup

Database sizing

Transaction log file sizing

Hard Disk Spindles

Troubleshooting

Appendix


Default DB sizes during FCS Setup

During the FCS install process, the default setting for the Collection Database (aka OnePoint DB) and the Reporting Database (aka SystemCenterReporting DB) are incorrectly set. From here on forward we will refer to them by the DB name, not the role name for FCS. SystemCenterReporting DB should be the larger of the two because it will contain the historical data, hence the table below under Database Sizing. By default, the OnePoint DB only maintains 72 hours worth of client data, while the SystemCenterReporting DB holds 395 days worth of data, so it is easy to see why the SystemCenterReporting DB needs to be larger.

Note – The max allowed size for the OnePoint DB is 30gb. The SystemCenterReporting DB does not have a specific limitation on size. Autogrow is not supported for the OnePoint DB and the SystemCenterReporting DB. This also includes their respective Transaction logs

It is recommended that once FCS is installed, the SystemCenterReporting DB Data Retention is reduced to a lower number, unless it is needed to have that many days of historical data. To reduce the number of days, see KB887016. More details on Data Retention is located in the Troubleshooting Section.


Database Sizing

1. OnePoint DB and SystemCenterReporting DB sizing

a. The following table(also located here) gives the details –

 

 


Transaction Log Sizing

1. OnePoint DB transaction log file sizing

a. OnePoint DB transaction log file size generally needs to only be about 1/3 the size of the OnePoint DB itself. For example, a 30gb OnePoint DB would need a 10gb OnePoint transaction log file size.

2. SystemCenterReporting DB transaction log file sizing

a. The scheduled task called SystemCenterDTSPackageTask copies data from the OnePoint DB to the SystemCenterReporting DB after the data is 72 hours or older.

b. The SystemCenterReporting DB transaction log file is used in the data transfer and needs to be approximately 5x (based on real world example) the size of the OnePoint DB. For example, if your OnePoint DB is 16gb in size, then the SystemCenterReporting DB transaction log file size needs to be 5x this or 80gb.

c. Why does the transaction log need to be so much bigger than the DB. See KB110139 for all details, but here is the snippet that explains why –

"For a narrow row, the amount of log space consumed for a particular UPDATE , DELETE or INSERT could be ten times the data space consumed. For wider rows, the amount of log space consumed will be proportionately less. Log space consumption is an unavoidable consequence of providing transactional integrity."

You notice we stated 5x above and the KB says could be 10x. This is a value judgement, it is possible to hit 10x, but 5x is more reasonable. The transaction log size does vary depending on the amount data, so the size may still need to be increased over the 5x suggestion.

3. Autogrow

a. It is not supported that Autogrow is set for the transaction logs for these DB’s. Autogrow is not supported on either the DB or Transaction log. This is due to a limitation in MOM 2005, not with the FCS product.

                                                               i. However, if you are trying to recover from a large Reporting DB size due to the Data Retention number that is set, you may need to temporarily enable Autogrow on the transaction log for the Reporting DB. See the Data Retention information in the Troubleshooting Section below for more details.

 


Hard Disk Spindles

Due to the transactional intensity of MOM\SQL with FCS, just having enough disk space is not the only item to consider.

1. Based on real world estimates, you will need a separate spindle for each DB and transaction log or you will need to span across multiple spindles to have the speed, efficiency and redundancy that is recommended. Depending on the size of the DB, a RAID set or SAN is recommended. The table below is recommendations that we have come up with based on support cases with FCS customers and based on the largest numbers provided in the table above for DB sizing. Note, these are recommendations not absolutes. Depending on the hardware that is in use, more spindles or less spindles may be needed. This number of spindles below is in addition to the spindle that the operating system is installed on.

 

 

Note: With 250 or less machines it is recommended to use 4 spindles. However, if 4 spindles are not available, then 1 or 2 spindles can be used. When using 1 or 2 spindles, DB performance will be impacted. It is recommended when using 2 spindles to put the OnePoint DB and associated Transaction Log on one spindle and the SystemCenterReporting DB and associated Transaction log onto the other spindle.

 


Troubleshooting

1. DTS job failures

a. SQL Agent or Scheduled Task permissions

                                                               i. Verify the service that SQL Agent runs under has permissions to the DB that the DTS job is running against

                                                             ii. Verify that the Schedule Task is configured to use an account that has permissions to the DB that the DTS job is running against

b. Database or Transaction log file size

                                                               i. Check the Event logs, you should have an event about the job failing at a certain Step and it will give the specific DB or Transaction Log that ran out of space.

                                                             ii. Once determined what item does not have enough size, grow that item to a larger size.

                                                            iii. This can be worked around by setting autogrow, but that is not supported for the OnePoint or SystemCenterReporting DB’s. When Autogrow is set, SQL stops processing while it grows the DB or Transaction log which can cause other performance issues and potential data loss.

2. Data Retention issues

a. To change the data retention for the SystemCenterReporting DB, see KB887016

b. This is useful when there is not enough disk space to retain the default 395 days worth of data in the SystemCenterReporting DB.

c. It can also improve performance when looking at historical information as the SQL stored procedures that are being called are not parsing 395 days worth of data.

d. Unless 395 days worth of retained data is needed, it is recommended that the customer lower the data retention number when related to MOM running in conjunction with FCS.

3. Reducing DB sizes

a. As stated above, you can reduce the SystemCenterReporting DB size by reducing the Data Retention.

b. However, if the SystemCenterReporting DB grows very large and the SCDWGroomJob fails then changing the Data Retention may not be a feasible method to resolve the issue. This is because the SCDWGroomJob may timeout or fill up the transaction log due to the amount of data being groomed out. There is a script that might be able to help. It reduces the number of retention days by 5 and runs the groom job, this cycles through until the number of days you have specified is reached. This is detailed in the Appendix.

c. A similar issue can happen with the OnePoint DB. The problem is when the scheduled task called SystemCenterDTSPackageTask fails to run and copy data from the OnePoint DB to the SystemCenterReporting DB. The OnePoint DB grows large and can run out of space.

d. You can work around the issues with both the OnePoint and SystemCenterReporting DB by following the information below. When doing this, the customer will lose data, so be sure to explain this to the customer.

e. For the OnePoint DB see Reducing large DB size in the Appendix below.

f. There is another method that can be used for the SystemCenterReporting DB, see this Blog entry for more details. This is a more manual method than the script provided below.

 


Appendix

How to shrink large Databases

SystemCenterReporting DB

There are times when the SCDWGroomJob may fail due to the SystemCenterReporting DB Transaction log filling up and/or running out of physical disk space or other reasons. This generally occurs when the SCDWGroomJob has been failing for some other reasons and now there is more than the typical 1 day to groom out of the SystemCenterReporting DB. The problem can also occur when you are trying to reduce the amount of days for Data Retention that you want the SystemCenterReporting DB to maintain. The script below can be run against the SystemCenterReporting DB to reduce the amount of data that is stored and therefore reduce the size. Here are the steps –

1. Disable the following SQL jobs

a. SCDWGroomJob

b. Microsoft Forefront Client Security

c. MOMX Partitioning and Grooming

2. Confirm there are no open transactions to the SystemCenterReporting DB

DBCC OpenTran

3. If there are any open transactions, wait until they are finished

4. Make sure the Backup\Recovery Mode for the DB is set to Simple

5. Change SQL to max degree = 1

exec sp_configure 'max degree', 1

reconfigure with override

go

6. Now run the SQL query to reduce the size of the DB. Note that the following settings must be changed before running this.

set @daysdiff

set @dayskeep

@daysdiff is the number of days it will prune each time, we recommend trying 5 to start with this, if this timesout or fill the disk with the Transaction Log, then lower it.

@dayskeep is the number of days you want to keep for Data Retention. Generally 90 days is enough but it depends on the requirements for the environment.

Here is the SQL query to run -

Declare @tempdays int

Declare @daysdiff int

Declare @dayskeep int

set @daysdiff = x

set @dayskeep = xxx

Groom_Again:

select @tempdays = datediff(dd, min(timestored), getdate()) from sdkeventview

set @tempdays = @tempdays - 1

set @tempdays = @tempdays - @daysdiff

select @tempdays

IF (@tempdays < @dayskeep)

BEGIN

select 'Finished Grooming'

return

END

select 'start grooming in a new loop'

select @tempdays

IF (@tempdays > @dayskeep)

BEGIN

Exec p_updategroomdays 'SC_AlertFact_Table', @tempdays

Exec p_updategroomdays 'SC_AlertHistoryFact_Table', @tempdays

Exec p_updategroomdays 'SC_AlertToEventFact_Table', @tempdays

Exec p_updategroomdays 'SC_EventFact_Table', @tempdays

Exec p_updategroomdays 'SC_EventParameterFact_Table', @tempdays

Exec p_updategroomdays 'SC_SampledNumericDataFact_Table', @tempdays

END

exec dbo.p_GroomDatawarehouseTables

select getdate()

DBCC opentran

goto Groom_Again

OnePoint DB

There are times when the OnePoint DB when used with FCS will grow dramatically due to large number of events coming from various clients. This is usually due a missing FCS override for something like VNC or Dameware that is used in the environment and the FCS agent flags and sends an Event 3004 to the MOM server over and over for each client. This causes the Event tables in the OnePoint DB to grow to millions of rows rapidly and usually triggers the Event Flood Detection Alert. Corresponding with the Event Flood Detection Alert, the Event Rule called Run Flood Detection will be changed so that the Auto-Approve Pending Computers will be set to False. Once the flooding Events are mitigated, the Run Flood Detection Event Rule will need to be changed back to True, otherwise systems that get the FCS client package will not be automatically approved in MOM going forward.

Note - using the following procedures will remove data from the OnePoint DB and the
data will not be recoverable unless a backup is taken of the OnePoint DB.
Method

There are multiple things that need to be done. Here is a summary, with details to
follow.

Summary

1. Identify the offending client or clients and mitigate the issue, whether an overrride needs to be implemented for something like Dameware or the system is being reinfected due to missing security updates, poor user awareness and so on.

2. After the floods of events to the MOM server have been mitigated, then the events need to be cleaned out the Event Tables in the OnePoint DB. If the policy in the specific environment requires that data to be retained, then backup the OnePoint DB so that events could be extracted at a later data. Data loss will occur if there is no backup, but in general, this is being done because nothing is processing and the data is effectively useless at this point.

3. After cleaning out the various Event Tables, check the TimeDTSLastRan entry in the Report Settings table and confirm it is not more than a couple of days old. If it is, then set it to the current day.

4. Next, run the SystemCenterDTSPackageTask in Scheduled Tasks and confirm that it completes successfully.

5. Fix the Run Flood Detection Event Rule.

6. At that point, everything should be running as expected.

Detailed Steps

Stop the MOM service before performing any of the following steps. This way there is no contention with the MOM agents trying to upload events and causing these queries to take a very long time.

1. Confirm what event is filling up the Event tables by running the following SQL query against the OnePoint DB.

select eventno, count(*) as total

from eventview

group by eventno

order by total desc

2. Based on the event, use the filter in the MOM Operators console to see what systems are throwing those events and the details of the event. As an example, the event may be a 3004 and due to TightVNC being flagged. If the item being flagged as malware is something that is used in your environment then set an override in the FCS Policy for the malware name that is being flagged, in this example it would be RemoteControl:Win32\TightVNC. There are many different types of scenarios and the mitigation may be straight forward as this one is, or not. It may require a call into Microsoft Product Support if the way to mitigate the problem is not clear.

3. Now reduce the size of the OnePoint DB by reducing the size of the Event tables. Run the following SQL query to determine which Event tables have the largest number of rows. In FCS, it is usually about 3-5 tables total that start with Event_XX where XX is the number of the table.

SET NOCOUNT ON

CREATE TABLE #TBLSize

(Tblname varchar(80),

TblRows int,

TblReserved varchar(80),

TblData varchar(80),

TblIndex_Size varchar(80),

TblUnused varchar(80))

DECLARE @DBname varchar(80)

DECLARE @tablename varchar(80)

SELECT @DBname = DB_NAME(DB_ID())

PRINT 'User Table size Report for (Server / Database): ' + @@ServerName + ' / ' +

@DBName

PRINT ''

PRINT 'By Size Descending'

DECLARE TblName_cursor CURSOR FOR

SELECT NAME

FROM sysobjects

WHERE xType = 'U'

OPEN TblName_cursor

FETCH NEXT FROM TblName_cursor

INTO @tablename

WHILE @@FETCH_STATUS = 0

BEGIN

INSERT INTO #tblSize(Tblname, TblRows, TblReserved, TblData, TblIndex_Size, TblUnused)

EXEC Sp_SpaceUsed @tablename

-- Get the next author.

FETCH NEXT FROM TblName_cursor

INTO @tablename

END

CLOSE TblName_cursor

DEALLOCATE TblName_cursor

SELECT CAST(Tblname as Varchar(30)) 'Table',

CAST(TblRows as Varchar(14)) 'Row Count',

CAST(LEFT(TblReserved, CHARINDEX(' KB', TblReserved)) as int) 'Total Space (KB)',

CAST(TblData as Varchar(14)) 'Data Space',

CAST(TblIndex_Size as Varchar(14)) 'Index Space',

CAST(TblUnused as Varchar(14)) 'Unused Space'

FROM #tblSize

Order by 'Total Space (KB)' Desc

DROP TABLE #TblSize

4. The above will indicate which Event_XX tables have any data in them. Use the information about what Event_XX tables have data in them and plug it into the following SQL query which truncates the tables so they will be empty after the script runs.

Remember, this will cause the loss of any data in the OnePoint DB regarding events.

exec dbo.MOMXDropEventFKs XX

truncate table event_XX

exec MOMXRecreateEventFKs XX

truncate table eventparam_XX

truncate table eventconsolidated_XX

Remember XX is the number listed, so if the TableSize query in step 3 shows Event_08 as being very large, then the SQL query would look like such -

exec dbo.MOMXDropEventFKs 08

truncate table event_08

exec MOMXRecreateEventFKs 08

truncate table eventparam_08

truncate table eventconsolidated_08

The script above needs to be run for each Event_XX table that is listed as having any data in it. If there are any Event_XX tables that have data left in them, then there is potential for data correlation errors in the OnePoint DB as related tables will have different information.

5. Now check the TimeDTSLastRan to make sure it has a date of the last few days. Run the following SQL query -

select timedtslastran from reportingsettings

6.  If it is more than a few days ago, then run the following SQL query to set it to today –

update reportingsettings

set timedtslastran ='YYYY-MM-DD'

Where YYYY is the four digit year, MM is the month and DD is the day.

7. Confirm that the table was updated using the following SQL query -

select timedtslastran from reportingsettings

8.  Now run the SystemCenterDTSPackageTask in Scheduled Tasks and confirm it completes successfully. It will show a 0x0 in the Scheduled Tasks UI if successful. This scheduled task can take many hours to run.

9. Fix up the Run Flood Detection Rule so that computers will be Auto-approved again using the following steps.

a. Open the MOM Administrator Console

b. In the Tree, Expand to Management Packs\Rule Groups\Microsoft Forefront Client Security\Server Behaviors\Event Rules

c. Double-Click on the Run Flood Detection rule listed on the right windows.

d. Click on the Responses Tab

e. Highlight the script listed in the box and click Edit

f. Under Script Parameters, double click on Auto-Approve Pending Computers

g. Change the value from false to true.

h. Click OK

i. Click OK again

10. Now it should be resolved.

Please post any comments or questions.

 

Thanks,

Shain Wray

Security Technical Lead