Understanding SCOM Resource Pools


<!--[if lt IE 9]>

<![endif]-->


Comments (26)

  1. Tommy says:

    Thank you for the info 🙂 Very useful

  2. M.Mathew says:

    Gr8 Article.!!Thx for the post!

  3. Hi,
    I used the following command to create resource pool in a new SCOM 2016 installation:
    New-SCOMResourcePool -DisplayName “Displayname of the pool” -Member (Get.SCOMManagementServer | ? {expression}) -Description “Description of the pool”
    I checked both of them and the $_.UseDefaultObserver value is “False” by default. I did not change it. Maybe this is true for SCOM 2016 only?
    BTW, this is a good article as we got used to it from Kevin. Thank you for it again.
    Sandor

    1. Kevin Holman says:

      Thanks for the catch. Pools created in powershell are apparently different than pools created in the UI. I will update this.

  4. Ravi says:

    Hi Kevin,

    When you say “Pool Suicides” (when less then 50% members are available) do you mean that, all agents will loss communication to the resource pool and turns as greyedout agents?

    Ravi

    1. Kevin Holman says:

      No – i don’t mean that at all. Agent communication has NOTHING to do with resource pools.

      NOTHING. Resource pools are for workflows. Agents communicate directly to management servers, and have their own mechanism for failover, which has not changed from the SCOM 2007 design. Customers often get confused and think that resource pools handle agent failover. They do not, and there is no relation.

      When a pool suicides, this means the pool unloads itself from all members, and all workflows that were hosted by the pool are not initialized, and therefore do not run.

      1. a.elfimov says:

        Hi Kevin,
        Do I understand correctly there are no any relationships between “Pool Suicides” and the SCOM alert “The resource pool failed to heartbeat” and it’s two different problems with two different causes?

        1. Kevin Holman says:

          Those are related. A resource pool failing to heartbeat means the pool isnt healthy and stable. This could be due to pool suicides, database connectivity, database blocking, load, bad workflows, all kinds of reasons. If this is common, you start looking at what you have placed on the pool, and what other events are being logged on the management server OpsMgr event leg.

  5. Asger Nissen says:

    Awesome post. Just a quick note on a scenario where we use the Observer role.
    We monitor different SNMP enabled devices (getting traps) as Network Devices in SCOM via a resource pool that consists of two gateway servers. Some of these network devices only allows for two trap destinations. As we want the redundancy, but cannot use more than two servers in the pool (for the reason explained above) we use a SCOM agent as observer for the pool.

    1. Kevin Holman says:

      Asger – THANKS! That is a perfect reason for observers!

      SNMP traps can only be processed for a device, when the pool members hosts that specific device. Therefore, when sending SNMP traps to a device hosted by a pool, as you have figured out – you must send the traps to ALL members of the pool in order to ensure the trap will be processed.

      So by only allowing two hosting members of a pool, but adding an observer, you get the high availability without impacting trap reception.

      Excellent feedback!

      1. Scott Brown says:

        I’d add to this that Unix/Linux Resource Pools are a good candidate for an observer as well. The same failover mechanisms don’t apply to Unix/Linux agents as to Windows Agents, and management of certs on gateways is complicated enough without trying to cross-import them across three RP members.

  6. Kevin, hi.
    As always a great and very helpful post. Thanks to you and Mihai.

    I created two ps1 that might help to show the config and to set the observers accordingly:
    https://gallery.technet.microsoft.com/PoSh-Show-Resource-Pool-40d9b18f
    https://gallery.technet.microsoft.com/PoSh-Set-Resource-Pool-aea4e7be

    Best regards,
    Patrick

    1. carsten anker says:

      Patrick thats not accurate.
      If your ressource pool has the default observer, and consists of 2 MS, then it IS higly available.

  7. JVD says:

    Kevin, I am having an issue with a customer, which uses a 2 MS + 1 OBS (Failover SQL Cluster DB). Fairly frequently, we are experiencing issues with the resource pools(All management servers resource pool unavailable) , which almost always occurs at night.
    As read in your post, you would advise to use an uneven amount of management servers. However this customer has 2 datacenters, ideally I would have an even amount of management servers on both sides to cover the load. Would it be advisable to move the DO to another server in this case?

    1. JVD says:

      Forgot to mention, the DB is under a significant load at night due to backups and maintenance.

      1. Kevin Holman says:

        Are the management servers split across multiple datacenters? In general, we dont recommend or support that configuration, and this is a very common misunderstanding with customers.

        Management servers require to be less than 5ms from each other AND the databases. In most cases where a customer has multiple datacenters, the network connection between DC’s is more than 5ms at all times, or they cannot guarantee to remain less than 5ms 24×7, such as times of high network saturation during backups, etc.

        This will cause resource pool failures.

        If that is your case, you have to consider some design changes, or you have to edit the registry to change the resource pool timeout and failure settings, from my blog article on tweaking management servers for large environments.

        1. JVD says:

          Hey Kevin,

          The management servers reside in the same datacenter, but they do reside in two different physical rooms. So latency is not an issue.
          The problem I seem to be having is that the pool seems to be under heavy load at night due to backups and SQL maintenance, which I assume causes pool instability.
          Would it be better to use an agent as an observer in this case, and remove the DO from the SQL server?

          1. Kevin Holman says:

            If this is happening at night – and you think it is backup related – then it is much more likely that the pool failures are caused by DB connectivity issues, not the DO.

            If the DO fails – this won’t cause pool instability, because the two MS in the pool will work just fine. I’d look at your disk I/O on the SQL server, and for other events in the MS event logs around this time for clues. So to answer you question, no, I would not recommend moving the DO to an agent, and I would recommend leaving the DO in the pool as the database with a size like that. I’d focus on the root cause of pool failure, which is likely SQL connectivity.

          2. JVD says:

            Hey Kevin,

            The system backup of the SQL server seems to be correlating with the pool issues. Have disabled the system backup for now. Thanks for the advice.

  8. Birdal says:

    Hi Kevin,
    I know that my question is not directly related to your article. But it is a design question. I a not so familiar with SCOM. We plan completely a new monitoring system based on SCOM 2016. Our environment is:

    – 2 Active Directory domains (different forests). There is no trusts between ADs.
    – Objekte: Windows Servers (800), Linux Servers (200), network components (200), some specific applications / services on both Active Directory.
    – We have 6 virtual servers for SCOM environment based on VMware.

    I prefer to locate all SCOM servers (Management Servers, console, SQL Servers, Gateway Servers, etc.) in the 1.AD, and open the necessary ports between Gateway Servers and the 2.AD.

    What is the best SCOM servers locations & design for this Environment?
    Which ports should be opened between SCOM Gateway Servers and the Domain Controllers in 2.AD?

    Thanks in advance.
    Birdal

  9. Ashish says:

    Hi Kevin,
    Agents reports to gateways and it doesn’t have anything to do with the resource pools.We have a scenario where two resource pools are configured in two data centers and both resource pools have single gateway server.Since both gateways are part of different resource pools so can we do agent failover between these two gateway servers??

    Thanks
    Ashish

    1. Kevin Holman says:

      Agent assignment has no relationship to Resource pools, so leaving Resource pool out of the Agent Assignment question would be best.

      Can you reform the question?

  10. Adriana says:

    Hi Kevin,
    How do I reference a resource pool that has being created through powershell? The source of the pool is ‘administrator’ instead of Management Pack. So how I use it here Target=”SC!Microsoft.SystemCenter.AllManagementServersPool” ?

  11. Gautam says:

    Thanks for the valuable information.

  12. HS Brown says:

    How can you get the name of any currently configured Observer when it is not the default observer via PoSh?

Skip to main content