Getting 31552 ->Exception ‘SqlException’: Timeout expired very frequently in SCOM server


In the SCOM server we  might  start Getting 31552 ->Exception 'SqlException': Timeout expired very frequently and  We will also see that the SQL server hosting the database role of the SCOM server going high on utilization of resources  and remaining for a long time.


 


Event Type:        Error


Event Source:    Health Service Modules


Event Category:                Data Warehouse


Event ID:              31552


Date:                     7/20/2009


Time:                     1:33:04 AM


User:                     N/A


Computer:          <RMS SERVER NAME>


Description:


Failed to store data in the Data Warehouse.


Exception 'SqlException': Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.


One or more workflows were affected by this. 


Workflow name: Microsoft.SystemCenter.DataWarehouse.StandardDataSetMaintenance


Instance name: Client Monitoring data set


Instance ID: {ED9228A4-9CAD-5782-CA20-7067D119EAFA}


Management group: <Management GROUPNAME>


For more information, see Help and Support Center at <http://go.microsoft.com/fwlink/events.asp>.


 


This can happen if the alert are not processed and is on the staging table. In order to overcome this issue we can do the following.


IMP Note: As there are database changes make sure that you have a valid database backup before you go ahead and perform these actions.


Run the following quries


select count(*) from Alert.AlertStage


select count (*) from Event.eventstage


select count (*) from Perf.PerformanceStage


select count (*) from state.statestage


 


The count on these tables will have count increasing and decresing. IF you find that any particular table is having huge count in lakhs like the one I had for the alert stage table. Follow the following.


 


In my case I had around 16 lakh + rows in alert stage table which will never get processed. So what I did is backup of the table and cleared the backlogs. For the same used the following


 


SELECT count(*) from ALert.AlertStage


SELECT * INTO ALert.AlertStage_backup FROM ALert.AlertStage


SELECT count(*) from ALert.AlertStage_backup


TRUNCATE TABLE ALert.AlertStage


SELECT count(*) from ALert.AlertStage


 


 


Then to make sure that the alert data is processing we ran the query


exec StandardDatasetMaintenance @DatasetId=' 6AF799BD-2CCC-41CF-97FD-058E2CBF9248'. This GUID was available for the alert data when we ran the query select * from StandardDataset


 


Query Result.


============


6AF799BD-2CCC-41CF-97FD-058E2CBF9248          Alert      0              0              1              AlertProcessStaging        01:00     240                2009-07-20 11:17:19.690


6F9FBA1B-EAE3-4071-AFA9-2101A81AA463         Event    0              0              1              EventProcessStaging      01:00     240                2009-07-20 11:17:16.033


06268CB1-F9C4-4195-AA67-7BB2EFD5224D          State     0              48           1              StateProcessStaging       01:00     240                2009-07-20 11:16:29.797


7EAE1679-17F5-4BA6-B685-9867A88C00E1            CM         0              28           1              AemProcessStaging        00:00     240                2009-07-20 11:17:25.017


C7E7D06C-6D00-46F2-819D-CB7D206EC361          Perf       0              48           1              PerformanceProcessStaging       01:00     240                2009-07-20 11:17:25.203


 


If we have any error when we run  exec StandardDatasetMaintenance @DatasetId=' 6AF799BD-2CCC-41CF-97FD-058E2CBF9248'. Then need to troubleshoot on that error which the results in the query windows displays. Else we can try running the count query (SELECT count(*) from ALert.AlertStage) and make sure that the data is coming in and going out as well.


 


While running the exec StandardDatasetMaintenance @DatasetId=' 6AF799BD-2CCC-41CF-97FD-058E2CBF9248', check if we are getting errors . If we have are getting error
“Sql execution failed. Error 8115, Level 16, State 1, Procedure -, Line 1, Message: Arithmetic overflow error converting IDENTITY to data type int.”
Then follow these steps of backing up the alert staging table


Modify  the IDENTITY values for the Alert_GUID table by the running the following query, since it was well beyond 800 million
For correcting this


 


Select max(AlertRowId) from Alert_GUID


 


This will give you a value . Please note this value.


 


DBCC CHECKIDENT(‘Alert.Alert_GUID’,RESEED,<maximum value of the result in the previous query + 1>)



Again try to run the stored procedure and ensure its success.


 


Once the same is done you can see that the CPU utilization will also come down as well as the frequency of 31552 errors.Now if you want to put back this data you can do it bit by bit using the following quires.


 


WHILE (SELECT COUNT(*) FROM ALert.AlertStage_backup) > 0


BEGIN


INSERT TOP(1000) INTO ALert.AlertStage


([AlertGuid],[AlertProblemGuid],[ManagedEntityRowId]


,[AlertName],[AlertDescription],[Severity],[Priority]


,[Category],[WorkflowRowId],[MonitorAlertInd],[DateTime]


,[RaisedDateTime],[SiteName],[RepeatCount] ,[AlertStringGuid]


,[ParameterHash],[DBCreatedDateTime],[DWCreatedDateTime],[DWLastModifiedDateTime])


SELECT


[AlertGuid],[AlertProblemGuid],[ManagedEntityRowId]


,[AlertName],[AlertDescription],[Severity],[Priority]


,[Category],[WorkflowRowId],[MonitorAlertInd],[DateTime]


,[RaisedDateTime],[SiteName],[RepeatCount] ,[AlertStringGuid]


,[ParameterHash],[DBCreatedDateTime],[DWCreatedDateTime],[DWLastModifiedDateTime]


FROM ALert.AlertStage_backup


 


DELETE TOP(1000) FROM ALert.AlertStage_backup


This will take a huge time depending on the number of rows.Once processed please drop table FROM ALert.AlertStage_backup.


 


What this will do it will put 1000 alerts at a time to the backup table to the staging table to process. And the completion of this will take time depending on the number of backlogs.


If alert data is not so important for you can drop this table ALert.AlertStage_backup.


The same can be checked for other (event, performance) staging tables as well.


 

Comments (2)

  1. I am not finding Alert_GUID table in Datawarehouse. Could you please let me know where I need to run this query.

  2. Walter Chomak says:

    Excellent KB.

Skip to main content