Recommended registry tweaks for SCOM 2016 management servers


 

image

I will start with what people want most – the “list”:

 

These are the most common changes and settings I recommend to adjust on SCOM management servers. 

Simply run these from an elevated command prompt on all your management servers.

 

reg add "HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters" /v "State Queue Items" /t REG_DWORD /d 20480 /f reg add "HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters" /v "Persistence Checkpoint Depth Maximum" /t REG_DWORD /d 104857600 /f reg add "HKLM\SOFTWARE\Microsoft\System Center\2010\Common\DAL" /v "DALInitiateClearPool" /t REG_DWORD /d 1 /f reg add "HKLM\SOFTWARE\Microsoft\System Center\2010\Common\DAL" /v "DALInitiateClearPoolSeconds" /t REG_DWORD /d 60 /f reg add "HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0" /v "GroupCalcPollingIntervalMilliseconds" /t REG_DWORD /d 900000 /f reg add "HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Data Warehouse" /v "Command Timeout Seconds" /t REG_DWORD /d 1800 /f reg add "HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Data Warehouse" /v "Deployment Command Timeout Seconds" /t REG_DWORD /d 86400 /f

 

I will explain each setting in detail below:

 

1.  HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters\
REG_DWORD Decimal Value:        State Queue Items = 20480

SCOM 2016 default existing registry value:   (not present) 

SCOM 2016 default value in code:   10240

Description:  This sets the maximum size of healthservice internal state queue.  It should be equal or larger than the number of monitor based workflows running in a healthservice.  Too small of a value, or too many workflows will cause state change loss.  http://blogs.msdn.com/b/rslaten/archive/2008/08/27/event-5206.aspx

 

2.  HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters\
REG_DWORD Decimal Value:  Persistence Checkpoint Depth Maximum = 104857600

SCOM 2016 default existing registry value = 20971520

Description:  Management Servers that host a large amount of agentless objects, which results in the MS running a large number of workflows: (network/URL/Linux/3rd party/VEEAM)  This is an ESE DB setting which controls how often ESE writes to disk.  A larger value will decrease disk IO caused by the SCOM healthservice but increase ESE recovery time in the case of a healthservice crash.

 

3.  HKLM\SOFTWARE\Microsoft\System Center\2010\Common\DAL\
REG_DWORD Decimal Value:
  DALInitiateClearPool = 1
  DALInitiateClearPoolSeconds = 60

SCOM 2016 existing registry value:   not present

Description:  This is a critical setting on ALL management servers in ANY management group.  This setting configures the SDK service to attempt a reconnection to SQL server upon disconnection, on a regular basis.  Without these settings, an extended SQL outage can cause a management server to never reconnect back to SQL when SQL comes back online after an outage.   Per:  http://support.microsoft.com/kb/2913046/en-us  All management servers in a management group should get the registry change.

 

4.  HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\
REG_DWORD Decimal Value:       GroupCalcPollingIntervalMilliseconds = 900000

SCOM 2016 existing registry value:  (not present)

SCOM 2016 default code value:  30000 (30 seconds)

Description:  This setting will slow down how often group calculation runs to find changes in group memberships.  Group calculation can be very expensive, especially with a large number of groups, large agent count, or complex group membership expressions.  Slowing this down will help keep groupcalc from consuming all the healthservice and database I/O.  900000 is every 15 minutes.

 

5.  HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Data Warehouse\
REG_DWORD Decimal Value:    Command Timeout Seconds = 1800

SCOM 2016 existing registry value:  (not preset)

SCOM 2016 default code value:  600

Description:  This helps with dataset maintenance as the default timeout of 10 minutes is often too short.  Setting this to a longer value helps reduce the 31552 events you might see with standard database maintenance.  This is a very common issue.   http://blogs.technet.com/b/kevinholman/archive/2010/08/30/the-31552-event-or-why-is-my-data-warehouse-server-consuming-so-much-cpu.aspx  This should be adjusted to however long it takes aggregations or other maintenance to run in your environment.  We need this to complete in less than one hour, so if it takes more than 30 minutes to complete, you really need to investigate why it is so slow, either from too much data or SQL performance issues.

 

6.  HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Data Warehouse\
REG_DWORD Decimal Value:    Deployment Command Timeout Seconds = 86400

SCOM 2016 existing registry value:  (not preset)

SCOM 2016 default code value:  10800 (3 hours)

Description:  This helps with deployment of heavy handed scripts that are applied during version upgrades and cumulative updates.  Customers often see blocking on the DW database for creating indexes, and this causes the script not to be able to deployed in the default of 3 hours.  Setting this value to allow for one full day to deploy the script resolves most customer issues.  Setting this to a longer value helps reduce the 31552 events you might see with standard database maintenance after a version upgrade or UR deployment.  This is a very common issue in large environments are very large warehouse databases.

 

 

Ok, that covers the “standard” stuff.

 

I will cover one other registry modification that is RARELY needed.  You should ONLY change this one if directed to by Microsoft support.

WARNING:

If you make changes to this setting, the same change must be made on ALL management servers, otherwise the resource pools will constantly fail.  All management servers must have identical settings here.  If you add a management server in the future, this setting must be applied immediately if you modified it on other management servers, or you will see your resource pools constantly committing suicide and failing over to other management servers, reinitializing all workflows in a loop.   All the other settings in this article are generally beneficial.  This specific one for PoolManager should receive great scrutiny before changing, due to the risks.  It is NOT included in my reg-add list above for good reason.

 

HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters\PoolManager\
REG_DWORD Decimal Value:
PoolLeaseRequestPeriodSeconds = 600
    PoolNetworkLatencySeconds = 120

SCOM 2016 existing registry value:  not present (must create PoolManager key and both values)  Default code value =  120/30 seconds

This is VERY RARE to change, and in general I only recommend changing this under advisement from a support case.  The resource pools work quite well on their own, and I have worked with very large environments that did not need these to be modified.  This is more common when you are dealing with a rare condition, such as management group spread across datacenters with high latency links, DR sites, MASSIVE number of workflows running on management servers, etc.


Comments (4)

  1. stephen lisko says:

    Kevin,

    Thanks for posting this article. Looks like the recommendations are the same as for SCOM 2012 management servers. So if I am planning to upgrade my existing SCOM configuration from 2012 to 2016 will the registry tweaks that I currently have set on my management servers remain, or will I have to tweak them?

    Thanks

    1. Kevin Holman says:

      Historically – the “upgrades” actually do an uninstall/reinstall…. so it is possible registry entries will get wiped out. I’d absolutely go back and re-verify these after an upgrade.

  2. Ronnie says:

    Thanks for the writeup.

    Do you know why these are not set by default?

  3. Breezer says:

    Thnx for the info! Helps alot!

Skip to main content