Recommended registry tweaks for SCOM 2016 management servers


 

image

I will start with what people want most – the “list”:

 

These are the most common changes and settings I recommend to adjust on SCOM management servers. 

Simply run these from an elevated command prompt on all your management servers.

 

reg add "HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters" /v "State Queue Items" /t REG_DWORD /d 20480 /f reg add "HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters" /v "Persistence Checkpoint Depth Maximum" /t REG_DWORD /d 104857600 /f reg add "HKLM\SOFTWARE\Microsoft\System Center\2010\Common\DAL" /v "DALInitiateClearPool" /t REG_DWORD /d 1 /f reg add "HKLM\SOFTWARE\Microsoft\System Center\2010\Common\DAL" /v "DALInitiateClearPoolSeconds" /t REG_DWORD /d 60 /f reg add "HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0" /v "GroupCalcPollingIntervalMilliseconds" /t REG_DWORD /d 900000 /f reg add "HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Data Warehouse" /v "Command Timeout Seconds" /t REG_DWORD /d 1800 /f reg add "HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Data Warehouse" /v "Deployment Command Timeout Seconds" /t REG_DWORD /d 86400 /f

 

I will explain each setting in detail below:

 

1.  HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters\
REG_DWORD Decimal Value:        State Queue Items = 20480

SCOM 2016 default existing registry value:   (not present) 

SCOM 2016 default value in code:   10240

Description:  This sets the maximum size of healthservice internal state queue.  It should be equal or larger than the number of monitor based workflows running in a healthservice.  Too small of a value, or too many workflows will cause state change loss.  http://blogs.msdn.com/b/rslaten/archive/2008/08/27/event-5206.aspx

 

2.  HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters\
REG_DWORD Decimal Value:  Persistence Checkpoint Depth Maximum = 104857600

SCOM 2016 default existing registry value = 20971520

Description:  Management Servers that host a large amount of agentless objects, which results in the MS running a large number of workflows: (network/URL/Linux/3rd party/VEEAM)  This is an ESE DB setting which controls how often ESE writes to disk.  A larger value will decrease disk IO caused by the SCOM healthservice but increase ESE recovery time in the case of a healthservice crash.

 

3.  HKLM\SOFTWARE\Microsoft\System Center\2010\Common\DAL\
REG_DWORD Decimal Value:
  DALInitiateClearPool = 1
  DALInitiateClearPoolSeconds = 60

SCOM 2016 existing registry value:   not present

Description:  This is a critical setting on ALL management servers in ANY management group.  This setting configures the SDK service to attempt a reconnection to SQL server upon disconnection, on a regular basis.  Without these settings, an extended SQL outage can cause a management server to never reconnect back to SQL when SQL comes back online after an outage.   Per:  http://support.microsoft.com/kb/2913046/en-us  All management servers in a management group should get the registry change.

 

4.  HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\
REG_DWORD Decimal Value:       GroupCalcPollingIntervalMilliseconds = 900000

SCOM 2016 existing registry value:  (not present)

SCOM 2016 default code value:  30000 (30 seconds)

Description:  This setting will slow down how often group calculation runs to find changes in group memberships.  Group calculation can be very expensive, especially with a large number of groups, large agent count, or complex group membership expressions.  Slowing this down will help keep groupcalc from consuming all the healthservice and database I/O.  900000 is every 15 minutes.

 

5.  HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Data Warehouse\
REG_DWORD Decimal Value:    Command Timeout Seconds = 1800

SCOM 2016 existing registry value:  (not preset)

SCOM 2016 default code value:  600

Description:  This helps with dataset maintenance as the default timeout of 10 minutes is often too short.  Setting this to a longer value helps reduce the 31552 events you might see with standard database maintenance.  This is a very common issue.   http://blogs.technet.com/b/kevinholman/archive/2010/08/30/the-31552-event-or-why-is-my-data-warehouse-server-consuming-so-much-cpu.aspx  This should be adjusted to however long it takes aggregations or other maintenance to run in your environment.  We need this to complete in less than one hour, so if it takes more than 30 minutes to complete, you really need to investigate why it is so slow, either from too much data or SQL performance issues.

 

6.  HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Data Warehouse\
REG_DWORD Decimal Value:    Deployment Command Timeout Seconds = 86400

SCOM 2016 existing registry value:  (not preset)

SCOM 2016 default code value:  10800 (3 hours)

Description:  This helps with deployment of heavy handed scripts that are applied during version upgrades and cumulative updates.  Customers often see blocking on the DW database for creating indexes, and this causes the script not to be able to deployed in the default of 3 hours.  Setting this value to allow for one full day to deploy the script resolves most customer issues.  Setting this to a longer value helps reduce the 31552 events you might see with standard database maintenance after a version upgrade or UR deployment.  This is a very common issue in large environments are very large warehouse databases.

 

 

Ok, that covers the “standard” stuff.

 

I will cover one other registry modification that is RARELY needed.  You should ONLY change this one if directed to by Microsoft support.

WARNING:

If you make changes to this setting, the same change must be made on ALL management servers, otherwise the resource pools will constantly fail.  All management servers must have identical settings here.  If you add a management server in the future, this setting must be applied immediately if you modified it on other management servers, or you will see your resource pools constantly committing suicide and failing over to other management servers, reinitializing all workflows in a loop.   All the other settings in this article are generally beneficial.  This specific one for PoolManager should receive great scrutiny before changing, due to the risks.  It is NOT included in my reg-add list above for good reason.

 

HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters\PoolManager\
REG_DWORD Decimal Value:
PoolLeaseRequestPeriodSeconds = 600
    PoolNetworkLatencySeconds = 120

SCOM 2016 existing registry value:  not present (must create PoolManager key and both values)  Default code value =  120/30 seconds

This is VERY RARE to change, and in general I only recommend changing this under advisement from a support case.  The resource pools work quite well on their own, and I have worked with very large environments that did not need these to be modified.  This is more common when you are dealing with a rare condition, such as management group spread across datacenters with high latency links, DR sites, MASSIVE number of workflows running on management servers, etc.


Comments (14)

  1. stephen lisko says:

    Kevin,

    Thanks for posting this article. Looks like the recommendations are the same as for SCOM 2012 management servers. So if I am planning to upgrade my existing SCOM configuration from 2012 to 2016 will the registry tweaks that I currently have set on my management servers remain, or will I have to tweak them?

    Thanks

    1. Kevin Holman says:

      Historically – the “upgrades” actually do an uninstall/reinstall…. so it is possible registry entries will get wiped out. I’d absolutely go back and re-verify these after an upgrade.

  2. Ronnie says:

    Thanks for the writeup.

    Do you know why these are not set by default?

  3. Breezer says:

    Thnx for the info! Helps alot!

  4. Birdal says:

    Hi Kevin,
    we have the issues Event ID 15002 and 15004 on both Gateway Servers.
    Are these Registry keys are to set then only on Gateway Servers, OR only on Management Servers, OR on both Management Servers AND Gateway Servers?
    Best Regards
    Birdal

    Event IDs:
    Event ID: 15002
    Task Category: Pool Manager
    Level: Error
    Keywords: Classic
    User: N/A
    Computer:
    Description:
    The pool member cannot send a lease request to acquire ownership of managed objects assigned to the pool because half or fewer members of the pool acknowledged the most recent initialization check request. The pool member will continue to send an initialization check request.

    Management Group:
    Management Group ID: {C601BF31-FBEC-4CD4-12F9-814C98AFF83E}
    Pool Name:
    Pool ID: {9EE78DB3-4D6C-DA05-608F-3B79294E3AFB}
    Pool Version: 3075036988681890219
    Number of Pool Members: 3
    Number of Observer Only Pool Members: 1
    Number of Instances: 2

    Log Name: Operations Manager
    Source: HealthService
    Date: 19.07.2017 17:22:02
    Event ID: 15004
    Task Category: Pool Manager
    Level: Error
    Keywords: Classic
    User: N/A
    Computer:
    Description:
    The pool member no longer owns any managed objects assigned to the pool because half or fewer members of the pool have acknowledged the most recent lease request. The pool member has unloaded the workflows for managed objects it previously owned.

    Management Group:
    Management Group ID: {C601BF31-FBEC-4CD4-12F9-814C98AFF83E}
    Pool Name:
    Pool ID: {9EE78DB3-4D6C-DA05-608F-3B79294E3AFB}
    Pool Version: 3075036988681890219
    Number of Pool Members: 3
    Number of Observer Only Pool Members: 1
    Number of Instances: 2

  5. venkatesh says:

    I have a Data Warehouse issue in my new SCOM 2016 environement, where the RMS is a 2016 server data center edition. The reg key path you mentioned to modify the DW command timedout is not present. I have both DB and DW on a single server and all my reg key path shows is HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Setup where all DB and DW details are present. Can you let me know where should I add Command Timeout Seconds to get rid of event 31551 and 31552 events.

  6. jpriver says:

    Thank you Kevin! great information as always!

  7. Manny Kang says:

    Hi Kevin, I hope all is well. we are seeing a number of 29181 event on the Management Servers, all related to SnapshotSynchronizationworkItem timeout issues. Artile (https://blogs.technet.microsoft.com/silvana/2014/09/04/eventid-29181-snapshotsynchronization-not-taking-place/) suggest to implement key HKLM\Software\Microsoft\Microsoft Operations Manager\3.0\Config Service

    ->New DWORD CommandTimeoutSeconds – is this required in 2016?

    Thanks

    1. Kevin Holman says:

      I generally don’t recommend extending snapshot timeout. Most of the time extending a timeout is like placing a band aid… on a gushing wound that really needs stitches.

      I would first try and understand what’s unique about your environment that is causing snapshot to fail.

      First – how long does it run before it fails?
      Does it fail often then complete with success?
      Is this SCOM 2016?
      What OS?

      Snapshot runs once per day – and should complete with success. It is normal for it to fail a few times every night, but it should have a successful completion, once per day.

      SELECT * FROM cs.workitem
      WHERE WorkItemName like ‘%snap%’
      ORDER BY WorkItemRowId DESC

      1. Manny Kang says:

        Hi Kevin,

        We see the issue on a random basis, running the query we have 2 out of 11 Management Server reporting the issue:

        Microsoft.EnterpriseManagement.ManagementConfiguration.DataAccessLayer.DataAccessException: Snapshot data transfer operation failed batch write at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.SnapshotSynchronizationWorkItem.CheckBatchWriteErrors() at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.SnapshotSynchronizationWorkItem.TransferData(SnapshotProcessWatermark initialWatermark) at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.SnapshotSynchronizationWorkItem.ExecuteSharedWorkItem() at Microsoft.EnterpriseManagement.ManagementConfiguration.Interop.SharedWorkItem.ExecuteWorkItem() ———————————– Microsoft.EnterpriseManagement.ManagementConfiguration.DataAccessLayer.DataAccessException: Data access operation failed Server stack trace: at Microsoft.EnterpriseManagement.ManagementConfiguration.DataAccessLayer.DataAccessOperation.ExecuteSynchronously(Int32 timeoutSeconds, WaitHandle stopWaitHandle) at Microsoft.EnterpriseManagement.ManagementConfiguration.SqlConfigurationStore.ConfigurationStore.ExecuteOperationSynchronously(IDataAccessConnectedOperation operation, String operationName) at Microsoft.EnterpriseManagement.ManagementConfiguration.SqlConfigurationStore.ConfigurationStore.WriteConfigurationSnapshot(IConfigurationSnapshotDataSet dataSet) at System.Runtime.Remoting.Messaging.StackBuilderSink._PrivateProcessMessage(IntPtr md, Object[] args, Object server, Object[]& outArgs) at System.Runtime.Remoting.Messaging.StackBuilderSink.AsyncProcessMessage(IMessage msg, IMessageSink replySink) Exception rethrown at [0]: at System.Runtime.Remoting.Proxies.RealProxy.EndInvokeHelper(Message reqMsg, Boolean bProxyCase) at System.Runtime.Remoting.Proxies.RemotingProxy.Invoke(Object NotUsed, MessageData& msgData) at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.WriteConfigurationSnapshotDelegate.EndInvoke(IAsyncResult result) at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.SnapshotSynchronizationWorkItem.SnapshotBatchWritten(IAsyncResult asyncResult) ———————————– System.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding. Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding. —> System.ComponentModel.Win32Exception (0x80004005): The wait operation timed out Server stack trace: at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose) at System.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error) at System.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync() at System.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket() at System.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer() at System.Data.SqlClient.TdsParserStateObject.TryReadByte(Byte& value) at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady) at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj) at System.Data.SqlClient.TdsParser.ProcessAttention(TdsParserStateObject stateObj) at System.Data.SqlClient.TdsParserStateObject.WriteSni(Boolean canAccumulate) at System.Data.SqlClient.TdsParserStateObject.WritePacket(Byte flushMode, Boolean canAccumulate) at System.Data.SqlClient.TdsParserStateObject.WriteByteArray(Byte[] b, Int32 len, Int32 offsetBuffer, Boolean canAccumulate, TaskCompletionSource`1 completion) at System.Data.SqlClient.TdsParser.WriteUnterminatedValue(Object value, MetaType type, Byte scale, Int32 actualLength, Int32 encodingByteSize, Int32 offset, TdsParserStateObject stateObj, Int32 paramSize, Boolean isDataFeed) at System.Data.SqlClient.TdsParser.WriteBulkCopyValue(Object value, SqlMetaDataPriv metadata, TdsParserStateObject stateObj, Boolean isSqlType, Boolean isDataFeed, Boolean isNull) at System.Data.SqlClient.SqlBulkCopy.ReadWriteColumnValueAsync(Int32 col) at System.Data.SqlClient.SqlBulkCopy.CopyColumnsAsync(Int32 col, TaskCompletionSource`1 source) at System.Data.SqlClient.SqlBulkCopy.CopyRowsAsync(Int32 rowsSoFar, Int32 totalRows, CancellationToken cts, TaskCompletionSource`1 source) at System.Data.SqlClient.SqlBulkCopy.CopyBatchesAsyncContinued(BulkCopySimpleResultSet internalResults, String updateBulkCommandText, CancellationToken cts, TaskCompletionSource`1 source) at System.Data.SqlClient.SqlBulkCopy.CopyBatchesAsync(BulkCopySimpleResultSet internalResults, String updateBulkCommandText, CancellationToken cts, TaskCompletionSource`1 source) at System.Data.SqlClient.SqlBulkCopy.WriteToServerInternalRestContinuedAsync(BulkCopySimpleResultSet internalResults, CancellationToken cts, TaskCompletionSource`1 source) at System.Data.SqlClient.SqlBulkCopy.WriteToServerInternalRestAsync(CancellationToken cts, TaskCompletionSource`1 source) at System.Data.SqlClient.SqlBulkCopy.WriteToServerInternalAsync(CancellationToken ctoken) at System.Data.SqlClient.SqlBulkCopy.WriteRowSourceToServerAsync(Int32 columnCount, CancellationToken ctoken) at System.Data.SqlClient.SqlBulkCopy.WriteToServer(IDataReader reader) at Microsoft.EnterpriseManagement.ManagementConfiguration.DataAccessLayer.SqlBulkInsertOperation.ExecuteSynchronously(IDataReader reader) at System.Runtime.Remoting.Messaging.StackBuilderSink._PrivateProcessMessage(IntPtr md, Object[] args, Object server, Object[]& outArgs) at System.Runtime.Remoting.Messaging.StackBuilderSink.AsyncProcessMessage(IMessage msg, IMessageSink replySink) Exception rethrown at [0]: at System.Runtime.Remoting.Proxies.RealProxy.EndInvokeHelper(Message reqMsg, Boolean bProxyCase) at System.Runtime.Remoting.Proxies.RemotingProxy.Invoke(Object NotUsed, MessageData& msgData) at Microsoft.EnterpriseManagement.ManagementConfiguration.DataAccessLayer.SqlBulkInsertOperation.AsyncExecute.EndInvoke(IAsyncResult result) at Microsoft.EnterpriseManagement.ManagementConfiguration.DataAccessLayer.SqlBulkInsertOperation.CommandCompleted(IAsyncResult asyncResult) ClientConnectionId:c9685527-804f-437c-959a-059a2ddc2fce Error Number:-2,State:0,Class:11

        The WorkItemStateID is 10 and the duration is under 70 seconds.

        Its SCOM 2016 UR2, OS Server 2016 and SQL 2016 RTM back end cluster (AlwaysOn)

        Its all very random, sometime we get multiple Management servers having the issue, other times just a couple.

        Could it be network related? Ie the SQL timesouts – network side blips?

        Thanks

        1. Kevin Holman says:

          Snapshot only runs once a day. Random failures are FINE as long is it completes every day.

          What was the output of the SQL query?

          1. Manny Kang says:

            Hi Kevin,

            So query returns WorkItemStateID as 20 (Succeeded) for 2 Management Servers and a value of 10 (failed) for 3 Management Servers.

            But for those servers that have a value of 10 (Failed) they look as they do complete as the CompletedDateTime filed is populated.

            The error is:

            Microsoft.EnterpriseManagement.ManagementConfiguration.DataAccessLayer.DataAccessException: Snapshot data transfer operation failed batch write at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.SnapshotSynchronizationWorkItem.CheckBatchWriteErrors() at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.SnapshotSynchronizationWorkItem.TransferData(SnapshotProcessWatermark initialWatermark) at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.SnapshotSynchronizationWorkItem.ExecuteSharedWorkItem() at Microsoft.EnterpriseManagement.ManagementConfiguration.Interop.SharedWorkItem.ExecuteWorkItem() ———————————– Microsoft.EnterpriseManagement.ManagementConfiguration.DataAccessLayer.DataAccessException: Data access operation failed Server stack trace: at Microsoft.EnterpriseManagement.ManagementConfiguration.DataAccessLayer.DataAccessOperation.ExecuteSynchronously(Int32 timeoutSeconds, WaitHandle stopWaitHandle) at Microsoft.EnterpriseManagement.ManagementConfiguration.SqlConfigurationStore.ConfigurationStore.ExecuteOperationSynchronously(IDataAccessConnectedOperation operation, String operationName) at Microsoft.EnterpriseManagement.ManagementConfiguration.SqlConfigurationStore.ConfigurationStore.WriteConfigurationSnapshot(IConfigurationSnapshotDataSet dataSet) at System.Runtime.Remoting.Messaging.StackBuilderSink._PrivateProcessMessage(IntPtr md, Object[] args, Object server, Object[]& outArgs) at System.Runtime.Remoting.Messaging.StackBuilderSink.AsyncProcessMessage(IMessage msg, IMessageSink replySink) Exception rethrown at [0]: at System.Runtime.Remoting.Proxies.RealProxy.EndInvokeHelper(Message reqMsg, Boolean bProxyCase) at System.Runtime.Remoting.Proxies.RemotingProxy.Invoke(Object NotUsed, MessageData& msgData) at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.WriteConfigurationSnapshotDelegate.EndInvoke(IAsyncResult result) at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.SnapshotSynchronizationWorkItem.SnapshotBatchWritten(IAsyncResult asyncResult) ———————————– System.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding. Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding. —> System.ComponentModel.Win32Exception (0x80004005): The wait operation timed out Server stack trace: at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose) at System.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error) at System.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync() at System.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket() at System.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer() at System.Data.SqlClient.TdsParserStateObject.TryReadByte(Byte& value) at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady) at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj) at System.Data.SqlClient.TdsParser.ProcessAttention(TdsParserStateObject stateObj) at System.Data.SqlClient.TdsParserStateObject.WriteSni(Boolean canAccumulate) at System.Data.SqlClient.TdsParserStateObject.WritePacket(Byte flushMode, Boolean canAccumulate) at System.Data.SqlClient.TdsParserStateObject.WriteByteArray(Byte[] b, Int32 len, Int32 offsetBuffer, Boolean canAccumulate, TaskCompletionSource`1 completion) at System.Data.SqlClient.TdsParser.WriteUnterminatedValue(Object value, MetaType type, Byte scale, Int32 actualLength, Int32 encodingByteSize, Int32 offset, TdsParserStateObject stateObj, Int32 paramSize, Boolean isDataFeed) at System.Data.SqlClient.TdsParser.WriteBulkCopyValue(Object value, SqlMetaDataPriv metadata, TdsParserStateObject stateObj, Boolean isSqlType, Boolean isDataFeed, Boolean isNull) at System.Data.SqlClient.SqlBulkCopy.ReadWriteColumnValueAsync(Int32 col) at System.Data.SqlClient.SqlBulkCopy.CopyColumnsAsync(Int32 col, TaskCompletionSource`1 source) at System.Data.SqlClient.SqlBulkCopy.CopyRowsAsync(Int32 rowsSoFar, Int32 totalRows, CancellationToken cts, TaskCompletionSource`1 source) at System.Data.SqlClient.SqlBulkCopy.CopyBatchesAsyncContinued(BulkCopySimpleResultSet internalResults, String updateBulkCommandText, CancellationToken cts, TaskCompletionSource`1 source) at System.Data.SqlClient.SqlBulkCopy.CopyBatchesAsync(BulkCopySimpleResultSet internalResults, String updateBulkCommandText, CancellationToken cts, TaskCompletionSource`1 source) at System.Data.SqlClient.SqlBulkCopy.WriteToServerInternalRestContinuedAsync(BulkCopySimpleResultSet internalResults, CancellationToken cts, TaskCompletionSource`1 source) at System.Data.SqlClient.SqlBulkCopy.WriteToServerInternalRestAsync(CancellationToken cts, TaskCompletionSource`1 source) at System.Data.SqlClient.SqlBulkCopy.WriteToServerInternalAsync(CancellationToken ctoken) at System.Data.SqlClient.SqlBulkCopy.WriteRowSourceToServerAsync(Int32 columnCount, CancellationToken ctoken) at System.Data.SqlClient.SqlBulkCopy.WriteToServer(IDataReader reader) at Microsoft.EnterpriseManagement.ManagementConfiguration.DataAccessLayer.SqlBulkInsertOperation.ExecuteSynchronously(IDataReader reader) at System.Runtime.Remoting.Messaging.StackBuilderSink._PrivateProcessMessage(IntPtr md, Object[] args, Object server, Object[]& outArgs) at System.Runtime.Remoting.Messaging.StackBuilderSink.AsyncProcessMessage(IMessage msg, IMessageSink replySink) Exception rethrown at [0]: at System.Runtime.Remoting.Proxies.RealProxy.EndInvokeHelper(Message reqMsg, Boolean bProxyCase) at System.Runtime.Remoting.Proxies.RemotingProxy.Invoke(Object NotUsed, MessageData& msgData) at Microsoft.EnterpriseManagement.ManagementConfiguration.DataAccessLayer.SqlBulkInsertOperation.AsyncExecute.EndInvoke(IAsyncResult result) at Microsoft.EnterpriseManagement.ManagementConfiguration.DataAccessLayer.SqlBulkInsertOperation.CommandCompleted(IAsyncResult asyncResult) ClientConnectionId:ae009e7a-0ce8-449b-a092-fd27ed6d52c3 Error Number:-2,State:0,Class:11

            So is it safe to say we can ignore anything occurrence where WorkItemName is 10 and CompletedDateTime is populated? And only really act on WorkItemStateID`s 12 (Abandoned) and 15 (Timeout)

            Thanks

            1. Kevin Holman says:

              No, it is not safe to say that. 10’s are bad. But again – it is ok to haver snapshot job failures, AS LONG AS you get at least one “20” per day, which means that it was able to complete with success, once per day.

  8. Manny Kang says:

    Thanks, the issue we are seeing is that the SnapShotSyncronization engine work item never seems to recover after 25hr period, so for example a 29180 does not get generated. Other Work flows, such as GetNextWorkItem engine work item, will generate a 29181 and recover with a 29180.

    What I am unclear about is when we get alert Microsoft.SystemCenter.ManagementConfigurationService.SnapshotWorkItemMonitor (generated by monitor Snapshot Sync state) – I see WorkItemStateId 10 for the server (when running the query). Will SnapShotSyncronization engine work item automatically rerun? Even with restarting the Config Service I don’t see 29180 logged for SnapShotSyncronization engine work item – is this normal behavior?

    Its random why see this on some servers not other.

    Thanks again

Skip to main content