I got a question posed to me in the form of a design change request for Orchestrator. And, true to fashion for Orchestrator, I pointed out that what was being asked for as a feature could be solved by the existing product by just architecting the runbook in a certain way. The scenario posed was this:
I have a parent runbook that gets a list of servers on which I will perform some actions. The actions are performed in a child runbook. However, if something happens and an error occurs in one of the iterations of the child runbook, I want to cause any further invocations of child runbooks to stop and I want the parent runbook to stop as well.
Of course, my years as a program manager have taught me first to look at the scenario and examine the details of what the person was asking and if the solution described (the new feature to be added) was the right way to go and if there were multiple ways to solve this same problem. Of course Orchestrator is a great prototyping tool and I can quickly create solutions and test them out to see if they work. So, in a matter of about 10 minutes, I had a solution to the scenario that can be used with the existing product. And while it’s not exactly what the user requested, it still accomplishes the same goals and they don’t have to wait for the next release to get it.
Here it is in a nutshell – you can use a Counter (a global resource object) to act as a flag. You set the flag if you have an error, and you check the flag when you’re about to do something, and break off if the flag is set. The following parent and child runbook diagrams illustrate the point.
Let’s look at the child runbook first. It starts with an Initialize Data activity to get parameter inputs from the parent runbook. Then it uses a Get Counter activity to look up the value of our flag. By default, this value is zero. If the value is zero, the runbook continues on its merry way to go do a bunch of stuff. If the value is not zero, then the runbook immediately exits. During the processing of the “stuff” in the middle, if any error occurs, then the counter (flag) value is set to 1 and the runbook exits. If no errors occur, then after everything is done, the runbook ends without setting the counter value.
Back up in the parent runbook, there is the Invoke Runbook activity that calls the child runbook. You might ask why we don’t check the counter value between the “Get a list of values” and the Invoke… well if you happened to read my previous post, Understanding Sequential vs. Parallel Processing of Runbook Activities, you’d know that after getting, say, a a list of 20 computers from the first activity, it would then run the “Get Counter” activity 20 times, and then move on to the Invoke activity and run it 20 times. And, since the setting of the counter based on errors occurs within the child runbook, you’re setting the value after you’ve already checked it. So there really isn’t a way to do a check between each invocation of the child runbook, which is why we check the value at the beginning of each child runbook.
In case we have a bunch of stuff to do after the invoke Runbook activity, we can check the counter value here and shortcut the flow to the end of the runbook if needed.
Important: In order for this process to work, you must check the “Wait for completion” box in the Invoke Runbook activity. Think about it – if you start all the child runbooks in parallel, and they all check the error flag at the beginning, then you’ll have the same issue as the parent runbook. Depending on how long the child runbooks take to complete and where you do your error flag checking in the parent, you might also miss the opportunity to stop the parent runbook.
Using this methodology, you might decide to do processing of child runbooks in waves or stages to ensure that each part is complete before the next part begins.
I hope this explains a little more about using Counters as flags and how to “bubble up” errors to parent runbooks and across child runbooks, and you understand a little more about error handling within your runbooks. Until next time!