How to run a script when a Resource Group fails over to the passive node of a Windows Server 2003 based MSCS Cluster

Having recently scripted the Cluster WMI Namespace (for fun) and blogged about it with my Cluster WMI Notifications post, I saw an opportunity to put that experience into practice here. We had a 2 node Active-Passive MSCS Cluster and the requirement was to run a custom script that did some work when a Resource Group failed over to the passive node. My plan was to kick off the custom script when we received a notification that the Resource Group had come online on the passive node.

A quick assessment of the proposed solution exposed its main flaw – that is – the script would be external to the cluster and would need to be managed separately. Any number of things could go wrong and prevent the custom script from running, which is a big no-no in a production environment.

So while I did learn a lot from testing WMI Notifications for the Cluster Namespace, it wasn’t suited to this scenario.

On then, to the next out of the box idea. Or not so out of the box. Remember a resource type called “Generic Script”? Sounds exactly like what is required here.

However, you can’t just write a script, create a “Generic Script” resource and point it to the script. If you try – as I did, while doing my research, you’ll find that the entire Cluster Administrator hangs just as you hit the Finish button on the “New Resource” wizard. This is because – as you’ll soon find out – the “Generic Script” resource DLL is looking for the Open entry point.

Scouring the internet got me this MSDN link:

Using the Generic Script Resource Type
https://msdn.microsoft.com/en-us/library/aa373089(v=VS.85).aspx

Any script that is to be configured as a “Generic Script” resource will need to contain functions named Open, Online, LooksAlive, IsAlive, Offline, Close and Terminate. These are the functions that the Cluster Resource Monitor will call to perform the relevant operations. In this way, it is different from a typical script in the same way an executable is different from a DLL. They both contain runnable code, but one can run by itself, whereas the other can’t.

For our purpose, we’ve got to call a custom script when the resource comes online on the passive node – which happens when the active node fails. This means that you put the custom script code directly into the Online function or call the custom script from a line of script in the Online function. Once the “Generic Script” resource comes online, it says online even after the script has run. This is good because you wouldn’t want the resource to go offline after it has finished its work. If it did, it would raise all kinds of  alarms in monitoring software, such as “Resource Group ABC is partially offline” and so on.

Another interesting discovery for me on this case was the Resource object and its LogInformation method. You can use this to log entries to the Cluster log, from the script, so you can see which entry points are called and when. This will give you an even clearer idea of where to place your custom script, if you need to call it more often (to check for any relevant condition).

Also, the precondition to running the custom script is that we’re on the passive node. This is a basic task for VBScript. What you want to do is get the computer name from WMI, compare it to the passive node name and run the custom script if we’re on the passive node or do nothing if we’re on the active node.

Here’s the script. After it, you’ll find some more information about how to configure it and some pitfalls to avoid. 

'======================================================
' File name: GenericScriptRes.vbs
'
'Run custom script only on a particular node when group
'comes online.
'======================================================

Function Open( )
Open = 0
End Function

Function Online( )
Resource.LogInformation "Entering Script Online"
On Error Resume Next

strComputer = "."
Set objWMIService = GetObject("winmgmts:\\" & strComputer & "\root\cimv2")
Set nodeInfo = objWMIService.ExecQuery("Select Name from Win32_ComputerSystem")

For Each objNode in nodeInfo

        If objNode.Name = "WIN2K3NODE2" Then
Resource.LogInformation "Custom Script will run here. "

                ' PLACE CUSTOM SCRIPT HERE

        Else
Resource.LogInformation "Custom Script will not run here. "
End If

Next

Online = 0
End Function

Function Offline( )
Resource.LogInformation "Entering Script Offline"
Offline = 0
End Function

Function Close()
Close = 0
End Function

Function Terminate( )
Terminate = 0
End Function

Function LooksAlive( )
'Resource.LogInformation "Entering Script LooksAlive"
LooksAlive = true
End Function

Function IsAlive( )
'Resource.LogInformation "Entering Script IsAlive"
IsAlive = true
End Function

'======================================================

I’ve commented out the LooksAlive() and IsAlive() Resource.LogInformation calls because these functions are called every 5 seconds and 1 minute respectively. They should only be uncommented when testing, or else your Cluster.log file will be flooded with these entries. Also, these functions return “true” rather than “0” like the others, as recommended by the the MSDN article that details the Scripting Entry Points.

This article also mentions that the scripts should be stored locally rather than on a shared disk. Since the resource should come online on both nodes (but only run on the passive node), you can drop the script into “C:\Windows\Cluster” or another location on both nodes so long as the path to the script file remains the same.

Another point worth noting is that the resource can go through an offline-online cycle on the passive node (most likely through manual intervention) which means that the custom script may run more than once. This is not a problem if the custom script actions perform a static activity such as updating the node name in a file referenced by another application. However, if it performs an activity that needs to occur only once after the initial failover, additional checks need to be put in place to prevent it from running again on the same node. One example of a check can be to create a tag or placeholder file on the shared storage before the Online function exits and then put a check at the beginning of the same function that checks for the file and doesn’t run the script if it is present. To clear this file when the Resource Group fails over to the active node, you could put in a snippet of code – again in the Online function – that checks if the node is the Active node and then deletes the file.

That’s it then. The MSDN articles I’ve linked to will give you all the information you need to have about the “Generic Script” resource.