How to multihome a large number of agents in SCOM


 

Quick download:  https://gallery.technet.microsoft.com/SCOM-MultiHome-management-557aba93

 

I have written solutions that include tasks to add and remove management group assignments to SCOM agents before:

https://blogs.technet.microsoft.com/kevinholman/2017/05/09/agent-management-pack-making-a-scom-admins-life-a-little-easier/

 

But, what if you are doing a side by side SCOM migration to a new management group, and you have thousands of agents to move?  There are a lot of challenges with that:

 

1.  Moving them manually with a task would be very time consuming.

2.  Agents that are down or in maintenance mode are not available to multi-home

3.  If you move all the agents at once, you will overwhelm the destination management group.

 

I have written a Management Pack called “SCOM.MultiHome” that will manage these issues more gracefully.

 

It contains one (disabled) rule, which will multihome your agents to your intended ManagementGroup and ManagementServer.  This is also override-able so you can specify different management servers initially if you wish:

 

image

 

This rule is special – in how it runs.  It is configured to check once per day (86400 seconds) to see if it needs to multi-home the agent.  If it is already multi-homed, it will do nothing.  If it is not multi-homed to the desired manaement group, it will add the new management group and management server. 

But what is most special, is the timing.  Once enabled, it has a special scheduler datasource parameter using SpreadInitializationOverInterval.  This is very powerful:

<DataSource ID="Scheduler" TypeID="System!System.Scheduler"> <Scheduler> <SimpleReccuringSchedule> <Interval Unit="Seconds">86400</Interval> <SpreadInitializationOverInterval Unit="Seconds">14400</SpreadInitializationOverInterval> </SimpleReccuringSchedule> <ExcludeDates /> </Scheduler> </DataSource>

 

What this will do, is run once per day, but the workflow will not initialize immediately.  It will initialize randomly within the time window provided.  In the example above – this is 14400 seconds, or 4 hours.  This means if I enable the above rule for all agents, they will not run it immediately, but randomly pick a time between NOW and 4 hours from now to run the multi-home script.  This keeps us from overwhelming the new environment with hundreds or thousands of agents all at once.  You can even make this window bigger or smaller if you desire by editing the XML here.

 

Next up – the Groups.  This MP contains 8 Groups.

 

image

Let’s say you have a management group with 4000 agents.  If you multi-homed all of these to a new management group at once, it would overwhelm the new management group and take a very long time to catch up.  You will see terrible SQL blocking on your OpsMgr database and 2115 events about binding on discovery data while this happens. 

The idea is to break up your agents into groups, then override the multi-home rule using these groups in a phased approach.  You can start with 500 agents over a 4 hour period, and see how that works and how long it takes to catch up.  Then add more and more groups until all agents are multi-homed.

These groups will self-populate, dividing up the number of agents you have per group.  They query the SCOM database and use an integer to do this.  By default each group contains 500 agents, but you will need to adjust this for your total agent count.


  <DataSource ID="DS" TypeID="SCOM.MultiHome.SQLBased.Group.Discovery.DataSource">
    <IntervalSeconds>86400</IntervalSeconds>
    <SyncTime>20:00</SyncTime>
    <GroupID>Group1</GroupID>
    <StartNumber>1</StartNumber>
    <EndNumber>500</EndNumber>
         
    <TimeoutSeconds>300</TimeoutSeconds>
  </DataSource>
</Discovery>

Also note there is a sync time set on each group, about 5 minutes apart.  This keeps all the groups from populating at once.  You will need to set this to your desired time, or wait until 10pm local time for them to start populating.

 

Wrap up:

Using this MP, we resolve the biggest issues with side by side migrations:

 

1.  No manual multi-homing is required.

2.  Agents that are down or in maintenance mode will multi-home when they come back up gracefully.

3.  Using the groups, you can control the load placed on the new management group and test the migration in phases.

4.  Using the groups, you can load balance the destination management group across different management servers easily.


Comments (3)

  1. M.Mathew says:

    Another great article!! .Thanks @Kevin!

  2. msviborg says:

    Hi Kevin
    Another great article making life as a SCOM admin a lot easier – THANK YOU!
    We have a regional domains and dedicated Management Servers per region. Would it be possible to control that via this script?
    I’m thinking like controlling which group the servers are added to, via a suffix or something, and then making sure the server in that group is added to a specific Management Server?

    Thanks in advance
    Michael

    1. Kevin Holman says:

      Yes, sure you could do this…. for the initial multi-home to a second management group.

      Why do you have regional management servers? Do you mean multiple management servers, in the same SCOM management group, but in different locations? If that’s the case, that’s a really bad design. Management servers should all be in the same physical location/network and so should the SCOM DB’s. Gateways can be location dependent if really required.

Skip to main content