Opalis 6.3: Automating Hyper-V Cluster Patching using the Configuration Manager IP (Part 1)

In the Getting Started with the Configuration Manager IP article, I talked about the basic scenarios we thought about when creating the Configuration Manager Integration Pack. Now it's time to go deeper and address some specific datacenter scenarios in detail. One of the things we hear from customers is that although ConfigMgr makes it a lot easier to patch computers through the use of WSUS and the Software Updates Management features, and ConfigMgr lets you assign maintenance windows to avoid taking down critical server resources, it's more challenging when dealing with Hyper-V Hosts and clustered machines because there are more implications to installing software and rebooting than with individual machines. Performing these related tasks is perfect for automating with Opalis using a series of workflows.

By using Virtual Machine Manager 2008 R2, you can start maintenance mode for a Windows-based host any time you have to perform maintenance tasks on the host, such as applying updates or replacing a physical component. When you start maintenance mode on a host in a Windows Server 2008 R2 cluster with highly available virtual machines, you can use live migration to evacuate all virtual machines to other hosts on the same cluster or place all virtual machines on the host into a saved state. When you start maintenance mode on a stand-alone Windows-based host, on a host in a Windows Server 2008 cluster, or on a Windows Server 2008 R2 host that has any non-highly available virtual machines, VMM automatically places all virtual machines into a saved state. Once the host has been patched (and any necessary reboots have been done), the host is taken out of maintenance mode and the VMs are moved back onto the host as needed.

In this article, I will walk through the process of solving this scenario by building the foundation of a workflow to address the problem in the simplest way possible, then adding on to provide greater and greater functionality. In this scenario, we'll make the following assumptions:

  • The hosts that need to be updated are already contained within a Configuration Manager collection
  • An update lists exists containing all of the updates that should be installed by the hosts
  • A Deployment Template exists defining the general deployment settings

To solve the basics of this scenario we need to do the following:

  • Locate which VM Hosts that need to be updated, then put them into maintenance mode to drain the VMs from them
  • Assign the updates to the host and monitor their installation state to completion
  • Take the host out of maintenance mode

First, make sure your Opalis and VMM servers are configured appropriately to allow PowerShell-based management. For instructions on configuring these settings, see the "Windows Management Framework" section of Integration Pack for System Center Virtual Machine Manager on TechNet.

Step 1: Locate and "Drain" the Host

Going by the first assumption up there, I can assume that I can get a list of VM Hosts that need to be patched from an existing ConfigMgr collection this collection might be as broad as the "All Windows Server Systems" collection, or you might have a more specific one like "VM Hosts in Building 44". In either case, we can use the "Get Collection Member" activity to read the collection for its members, and then process them one by one.

Here's a quick example of how this process starts:

image

Using the Custom Start activity, we define parameters for the workflow. This includes the name for the collection where we will be putting the computers to be updated (and where the software updates deployment package will be assigned) and the collection where we will find all of our servers to be patched. Just to make sure the collection exists, we have a "Create Collection" activity here to create it. If it fails, it's ok because we will continue on to the next object assuming that if it succeeded, the collection was created, and if it failed, the collection was already there. I'll show you how to add more detailed error-checking here later. For now this will do fine.

Next we assign the deployment package to the collection, and then we get a list all the computers to be patched. To make sure they're done sequentially (so we don't end up taking out all nodes of a cluster at once with a patching process), we use a "Trigger Policy" activity and make sure "Wait for Completion" is checked.

Now on to the "Drain and Patch" policy. For putting the hosts into maintenance mode, I utilize the "Run .NET Script" activity a couple of times with a few simple PowerShell commands. The only caveat here is when you are running Opalis on a 64-bit OS because VMM will install the 64-bit version of the DLLs and register the PowerShell snap-in into the 64-bit PowerShell environment. Opalis, however, will use the 32-bit PowerShell environment for running the script. So in order for your script running in the 32-bit environment to interact with the VMM snap-in running in the 64-bit environment, you have to put it in an Invoke-Command script block. For example, the PowerShell code below will check to make sure the VMM snap-in is actually installed, and if not, load the snap-in:

 Invoke-Command $env:computername {
    if (!$(Get-PSSnapin | Where-Object {$_.name -eq "Microsoft.SystemCenter.VirtualMachineManager"})) 
    {
        Add-PSSnapin Microsoft.SystemCenter.VirtualMachineManager 
    }
    
} -ConfigurationName Microsoft.PowerShell

Using PowerShell remoting (the "Invoke-Command" cmdlet), we can access the 64-bit environment to run the command. Keep in mind you will have to do this for every script that needs to access 64-bit snap-ins.

In my PowerShell scripts within the Run .NET Script activity, I also like to add error handling to surface errors back up to Opalis. Just like any other PowerShell script, I use "try – catch" blocks to trap any exceptions, and I also use the command $ErrorActionPreference = "Stop" to force any errors to become exceptions I can trap and then throw to Opalis. Opalis sees the thrown exception and stops the workflow with a failure. Here's a perfect example:

- Without the $ErrorActionPreference = "Stop" at the beginning of the script, my object completes successfully but nothing happens.

- With the $ErrorActionPreference = "Stop" at the beginning of the script, my object fails with the error shown here:

image

Adding this error checking makes a BIG difference in the supportability of the workflow.

Another potential side-effect of having to use PowerShell remoting is something called "double-hop". If you are remoting back to the Opalis computer and then running commands that link to a separate VMM Server, you're essentially going through three computers (even though the Opalis computer is two of them), and your credentials won't pass the second computer. So if you use the script I provided above and then add-on a command like "Get-VMHost", you will get an error like the one below:

image

To solve this, you can do one of two things:

  • Enable and use CredSSP during the remote sessions
  • Remote directly to the VMM server instead of looping back to the Opalis machine first

Obviously, the second method requires fewer additional security modifications and less scripting, so that's the one I will choose. If you want to find out more about the CredSSP topic, you can check out these links:

So far, my script looks like this:

image

(Note that in this screen shot I am using variables instead of Custom Start parameters – this is just for speed in testing)

And, when I run the script, I get this:

image

Luckily the error message is self-explanatory – my VMM Server is not on a cluster, so therefore I can't use the "-MoveWithinCluster" switch. If I put my VM Host into maintenance mode, it will put all of the guest VMs on that host into a saved state, which will disconnect all users from those VMs until they come back up. This is may not what be we want to happen (at least during working hours), so automatically handling this case to just remove the command line switch isn't enough. We want to take these hosts and move them to a different collection so they can be processed separately.

To do this, I will just check the "HostCluster" property of the VM Host object. If it's empty, I know the host is not in a cluster. If it is in a cluster, I can apply the command. To do these things I need branching in the script, and I also need to set a property so Opalis can branch the workflow. I do this by adding a $HostCluster variable and then setting it to an empty string. If the host is in a cluster, then I simply add the cluster name to the string. Here's the final script:

image

I then add a Published Data property that picks up the value of $HostCluster from the script:

image

Looking back at the workflow, I branch off the next step in the process based on the value of that published data property. If the value is empty, I go to "Add Computer to Collection" to put it in a holding bin. If the property has a value, then I go to the next step, which is checking to make sure the guest VMs have all been migrated before patching the server.

image

Once the VMs are drained from the host, I can assign it to the collection (where the update deployment has already been assigned), refresh the collection and the client to speed up the process of the client determining it actually needs the new updates, and then I monitor the software updates installation process until complete.

In the next post, I will talk about this last part in a little more detail and how you can add automated remediation to the status checking in case the server is stuck waiting for a reboot, or a failure occurs, or something that might need manual intervention.