Updating a SQL Cluster using System Center Configuration Manager and System Center Orchestrator

Current Runbooks and recorded MMS session can be found here:

https://blogs.technet.com/b/neilp/archive/2013/04/15/mms2013-session-now-on-channel-9-patching-a-hyper-v-cluster-with-orchestrator-configuration-manager-including-downloadable-runbook-exports.aspx


Common Question - How can I use the Configuration Manager Software Update capabilities to elegantly update my active/passive SQL clusters? To raise the bar a bit, what if there is an additional requirement to at run time, dynamically determine the cluster state (who is active, who is passive) and execute the updates accordingly.

This process will take some engineering, logic, scripting, etc. Can Configuration Manager itself act as the fabric that would hold the custom logic, custom code, etc. together? Probably, but this would take complex collection creation, complex task sequencing, you get the idea. We can however not only simplify this process but also create a repeatable automated process using Opalis / Orchestrator integration.

In this System Center PFE Blog Post I will show a sample solution that integrates System Center Configuration Manager 2007, System Center Orchestrator Beta 1, and multiple pieces of related technologies to easily handle the updating of two node Active Passive SQL cluster. This blog post includes;

  • An explanation of the included sample Runbooks.
  • Process specific orchestration / code samples.
  • A demonstration video of the process in action.

I will not be including an overview of how to use Opalis / Orchestrator, nor detailing any Workflow / Runbook best practices. For these items a great starting point would be Charles Joys Blog.

Pre-Work -

The first step of automating any process is to define the process. So for this example the process may look like -

  • Prepare Configuration Manager for the update process.
    • Create a Collection.
    • Create the update deployment (update management).
  • Determine SQL State.
    • Identify all nodes.
    • Identify active node.
    • Identify all non-active nodes.
  • SCOM maintenance Mode.
    • Before updating any node place it into SCOM maintenance mode.
    • After patching any node remove it from SCOM maintenance mode.
  • Update and Reboot all non-active nodes.
  • Move activity from the active node to one of the just patched non-active nodes.
  • Update and reboot remaining node.
  • Validate cluster services are online.
  • Clean up the update jobs/collection.
  • Send email notification indicating that the process has completed.

With the process nailed down, we will need to determine the pieces of technology that will be required in order to complete the process.

For this sample update scenario, the following technology will be utilized or managed in some way.

  • System Center Orchestrator 2012 Beta 1 (note, this is beta software and is subject to change) –
  • System Center Configuration Manager 2007 R3.
  • System Center Operations Manager 2007.
  • Windows 2008 R2 Fail over Cluster services.
  • SQL 2008 R2.
  • PowerShell.
  • SMTP Server.

Orchestration Overview –

This sample solution consists of eight Runbooks. I have attempted to make these Runbooks self-documenting, so as we look at each, refer to the screen shot for the step by step process.

Main Worker Runbook – this Runbook consumes some user defined data (Collection Name) and handles the execution of all process specific sub Runbooks. I could have easily added each step in the cluster patching process to one large Runbook, however breaking them up into smaller sub process creates small reusable parts. For example, below you will see reference to a Runbook ‘Move Cluster State’; this task uses some power shell to move the cluster activity from one node to another. Segregating this activity into its own isolated Runbook allows us to use it during any process that may require such an activity.

 

Configure Updates – Our first called sub-process creates a collection in System Center Configuration Manager, and then uses a pre-existing deployment List and Template to create and ‘advertise’ an update deployment to the newly created collection. To create this run book you would simply need to integrate the System Center Configuration Manager Integration Pack and use the Activates native to that IP (Create Collection and Deploye Software Updates).

 

Determine Cluster State – next we have a Runbook that will determine the active node of the cluster, determine all nodes of the cluster, and finally compare the values arriving at a list of the non-active nodes. So the full purpose of this process is to gather a list of Passive Nodes, the current active node, and then publish this data for consumption back up top (in the main process run book). This is achieved through the use of one Query Database Activity (Determine Active Node), two Run .Net Script activates (Get All Nodes and Determine Non Active Nodes), and finally a Return Data Activity.

 

Apply Updates and handle Reboots – This Runbook will be executed multiple times; once for each node requiring updates. With this sub process we will be placing the computer into maintenance mode in System Center Operations Manager, adding the computer to the collection created during the main process, refreshing the collection, refreshing the client policy, and then triggering a Software Updates Deployment Evaluation Cycle. Finally this Runbook will wait for the client to reboot, and take it out of Operations Manager maintenance mode. To reiterate, this will happen once for each passive node, and then once again after the cluster state has been moved for the remaining node.

Here we are once again using mostly native Activities to the Configuration Manager IP (Add Computer to Collection, Refresh Collection, Refresh Client, Initiate Update Evaluation cycle). We also have two ‘Get Computer/IP Status’ activities. We use these to observer the machine reboot.

Take note of the link logic on between the ‘Wait for Reboot’ and ‘Wait for Post Boot’ activities, as we want to move on once the computer has been rebooted, we are actually waiting for a failure of the get computer status. This failure will indicate that the computer is gone offline (has been rebooted). We will then immediately start looping on a second activity waiting for the compute status check to be successful (indicating that the machine is back online).

In addition to the work being done, we are also calling two additional Runbooks, one at the beginning, and one at the end that will put the currently patching machine into SCOM maintenance mode, and then taking is out of maintenance mode.

 

Move Cluster Activity from Active to non-Active node – With the passive nodes all patched, this sub-process simply moves the cluster from the currently active node to a patched passive node. The bulk of this process consists of only one Run .Net Script activity (script attached).

Once completed the remaining unpatched node (that is the once was active node), will run through the patching process as detailed in the last Runbook.

 

Clean up and Send completion notification – and finally we delete the collection (and updates deployment), and send out an email stating that the process has been completed. The activities in this Runbook are self-explanatory and consist of a native Configuration Manager IP activity and a Send Email activity.

 

Start and Stop SCOM maintenance – tucked away inside this process is two sub –sub Runbooks that handle the Operations maintenance mode stuff.  Here is an example of those. The Maintenance mode activities come from the Operations Manager Integration Pack.

 

 

Video Demonstration - So let’s take a look at this process in action.

[View:https://www.youtube.com/watch?v=vd8B4-Llq_A]

 

Hopefully this post has helped to not only key in some ideas on how to elegantly handle the update process across you clustered services, but to also conceptually highlight through this real work scenario the capabilities of Opalis / Orchestrator and how these tools can provide for a tight integration between all systems.

I will be working with this example in future blog post, so be on the lookout for those. Please feel free to comment if you would like more information or more clarification on anything discussed in this post.