[SCOM] Health Service Private Bytes and Handle Count leak on physical Management Server


If you experience sporadical restarts of the HealthService on physical Management Servers after installing  .NET 4.5.2 you should check how high the counters

Health Service Private Bytes and Handle Count actually get.

Chances are that even if you double the threshold values for these two counter you will still get the issue.

This situation seems to happen only on physical servers where there is also a lot of RAM (>10GB).

It seems that with the  .NET 4.5.2 version the garbage collector algorithm got changed and will not trigger anymore if a process has not consumed enough of the free memory of the machine. Since we usually see a lot of memory on the servers, this threshold will get quite high before the garbage collection occurs on its own.

We are however expecting a further release of the .NET framework which will render the collection to more sensitive thresholds.

But at the time we have some workarounds:

- virtualize your management servers

- keep raising the Thresholds for Health Service Private Bytes and Handle CountThreshold (think very big here)... but since you have a lot of memory anyway this should not be a problem.

Comments (2)

  1. Does this apply to SCOM 2016 also? We have a fresh install on Windows Server 2106, with 6 management servers, no agents yet installed. All OS patches installed, SCOM 2016 UR2 applied, as well as all Management Pack updates are in place. It takes from 12 to 24 hours to through the Health state, then eventually the health service resets. the threshold is at 30,000 which is the normal at install. Increasing the threshold to 60,000 or 100,000 puts the state back to healthy… for now. Also, is there a dashboard to watch this counter?
    Thank you,
    Peter C.
    Aramark Corp.

    1. Silvana Deac says:

      Yes, it could happen also for 2016.
      The counter can be viewed in the default OS performance view, you should find the counters there.
      Sometimes this behavior is triggered by 3rd party MPs for example Veeam MP or Dell.. hard to tell without a more in depth analysis.

Skip to main content