SCOM Maintenance Windows

Caution
Test the script(s), processes and/or data file(s) thoroughly in a test environment, and customize them to meet the requirements of your organization before attempting to use it in a production capacity.  (See the legal notice here)

 

Note: The workflow sample mentioned in this article can be downloaded from the Opalis project on CodePlex:  https://opalis.codeplex.com

 

Overview

The SCOM Maintenance Windows sample demonstrates a basic process where Opalis orchestrates the patching of a server with alert monitoring in SCOM. The activities in Opalis Integration Pack for SCOM that deal with maintenance windows enable Opalis workflows to orchestrate processes on monitored systems in such a way as to minimize false alarms. The basic manual process this workflow replaces would be:

  1. At 9:00 pm get a list of severs in a given collection.

  2. Enable Maintenance Mode for these servers in MS SCOM.

  3. Create an advertisement to distribute a patch to the collection.

  4. Wait 5 minutes.

  5. Working with the same list of collection members, test to see if each server is up and once this is the case turn off Maintenance Mode.

The sample highlights a few key features associated with Orchestration of such a process:

  1. The workflow is a classic example of a “Run Book Automation” in that it takes operations procedures that would normally drive the behaviors of human beings and replaces this work with automation and integration.

  2. Showing how a remediation process can interact with Operations Manager to provide line-of-sight remediation. This means that it updates Operations Manager so that people looking at the Operations Manager console will be able to recognize that Opalis has initiated a remediation process and allow that process to complete before taking additional action.

  3. Verification of the alert is a key first step in the remediation process since it guarantees that the remediation is acting on a valid condition before it initiates.

Workflow Walk-Through

This workflow itself is very simple and with a moderate amount of tweaking should be able to work in most environments. Some key things to note in the workflow itself:

  1. The workflow is scheduled to run at 9pm, in this case every day of the week. No doubt other mechanisms for triggering patching may be desired. This workflow could be initiated via a Custom Start or perhaps a Monitor activity to capture when to run.

  2. The list of hosts in a collection is queried. The Opalis multi-instance databus will call the next activity (Maintenance Mode On) once for each host in the collection.

  3. “Maintenance Mode On” will be run once per host before proceeding to the Junction. The output of this activity is “flattened” so we only will call the next activity (which creates an advertisement) once.

  4. The advertisement is created. Again, because “flatten” was used on the prior activity, only one advertisement is created for all the servers in the collection.

  5. A link (labeled “Wait 5 Minutes”) is used to create a delay. The workflow could be modified to use some other mechanism to verify the patch was deployed properly. In the property of the link one can configure the amount of time (in seconds) to wait before triggering the next activity.

  6. The collection is re-queried. While the data already exists on the databus (the second activity in the workflow) it’s easier to just re-query the collection than to create the necessary structure to pipe it to this point in the workflow.

  7. Each server is tested to make sure it is up and running using a “ping” command. Other mechanisms for validation of a system being up and running could be used. For example, one could query the disk space on the system, which verifies a bit more than just a ping. Because the databus processes

  8. Maintenance mode is turned off once the system can be pinged.

clip_image002

 

 

Share this post :