Orchestrator and Activity data size limits – a case of re-executing Runbooks

I am going to start by mentioning this great article which explains about the maximum size of the parameters we can use in Orchestrator: http://blogs.technet.com/b/orchestrator/archive/2012/05/08/orchestrator-quick-tip-what-s-the-maximum-size-of-parameters.aspx

As the article also states, we can have data passed onto the databus as large as 2GB - *but* - and this is the catch here - different Activities used may have different other limitations. Let's take the Query XML Activity as an example here and let's see what can happen if we give it "too much to do".

The Query XML Activity was designed to get an XML as imput and then using a XPath Query it should output each result it has found and pass it to the databus for the next Activity in a looping manner. Now as far as general XPath query is being used, this should generally not return a very large result (string, XML block) as its output. In Orchestrator the Query XML Activity can return a result which is a max around 90kb as far as I could get from testing. As said, this is normal as far as how general XPath is used and was designed for: http://en.wikipedia.org/wiki/XPath.


Let us now go through a scenario where we have a big XML - let's say around 1.5BM. We get this XML with a PowerShell script from somewhere or generate it directly in the script and pass this XML over to our Query XML Activity (we could of course give an XML from a share or local hard drive to this Activity).

         NOTE: It is best practice and a very good choice to have the XML whitespace stripped - if we won't do this, then we might get errors or unexpected results - PowerShell should do that by default if we use a standard XML Object variable: http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.preservewhitespace.aspx

Now let's say that our XML has this format:

 <?xml version="1.0" encoding="iso-8859-1"?>


So we only have 2 node "levels" here in this example - we have the root node authors and the child nodes author - which we know are a A LOT because our XML has about 1.5MB size ... Now let's say that in the Query XML Activity we configure the XPath Query to be: /authors/author

Well ... this will take about 40 minutes to complete until the Query XML Activity will pass out each result of the XPath Query in a looping fashion to the next Activity like NAME_1, NAME_2, NAME_3, etc. So this takes a while but it works great!


Hmm ... now, what if we would change the XPath Query to something it was not designed to handle? Let's change it to this: /authors

This should return only 1 result which would be the XML block with all the author nodes - well this would be mostly the entire XML and would have a size about as much as the XML file so close to 1.5MB. When we start this Runbook, we can see that it seems to "hang" when getting to the Query XML Activity and after some time we can see that the Runbook is re-started and we start seeing another instance of it ... then after a while again, and again, until we manually stop this Runbook.

Seems like we are giving the Query XML Activity something it cannot handle ... but what does really happen, why doesn't it simply fail? Let's see shall we? First we activate level 7 trace logging for the RunbookService and PolicyModule on the Runbook Server which will be running this Runbook: http://technet.microsoft.com/en-us/library/hh508839.aspx

In PolicyModule we will look to see if the Activity gives us any errors in general and in RunbookService we will look to see what it does and why it restarts the Runbook. Now we start the Runbook again and then we let it restart itself 2-3 times and then we stop it and then first change the trace logging level back to 1. We should never let the trace logging level to 7 because it generates a lot of data and can have a heavy effect on performance - so we should only activate this while troubleshooting and then always change it back to 1.

Now first we check the PolicyModule Log where we cannot see any errors in this case but nor can we see much relevant activity ...

However, now we go and check the RunbookService Log and we see the following relevant lines:

 4 AutoTrace : << ActionServerStorageDB::updatePolicyHeartbeat (6594 ms)
4 AutoTrace : >> StorageSuspendedModeManager::onSuccess
4 AutoTrace : << StorageSuspendedModeManager::onSuccess (194 ms)
4 PolicyHeartbeatThread is running
4 AutoTrace : >> ActionServerStorageDB::updatePolicyHeartbeat
4 AutoTrace : >> CODBDataStore::PingConnection
4 AutoTrace : << CODBDataStore::PingConnection (797 ms)
4 AutoTrace : >> CODBDataStore::getDbEncrypter
4 AutoTrace : >> `anonymous-namespace'::ensureWorkingConnection
4 AutoTrace : >> `anonymous-namespace'::throwIfConnectionIsNotWorking
4 AutoTrace : << `anonymous-namespace'::throwIfConnectionIsNotWorking (504 ms)
4 AutoTrace : << `anonymous-namespace'::ensureWorkingConnection (1506 ms)
4 AutoTrace : << CODBDataStore::getDbEncrypter (1578 ms)
4 Open recordset: "UPDATE POLICY_PUBLISH_QUEUE SET [Heartbeat]=getutcdate() WHERE ([SeqNumber]= ? ) AND ([AssignedActionServer]= ? )"
4 AutoTrace : << ActionServerStorageDB::updatePolicyHeartbeat (3669 ms)
4 AutoTrace : >> StorageSuspendedModeManager::onSuccess
4 AutoTrace : << StorageSuspendedModeManager::onSuccess (277 ms)
4 PolicyHeartbeatThread is running
4 Workflow 1012 (policy {BF5C3F4F-863A-4E39-BB0F-3216575D525F}) appears to be dead: remove
4 AutoTrace : >> WorkflowStoppersManager::isIn
4 AutoTrace : << WorkflowStoppersManager::isIn (753 ms)
4 AutoTrace : >> WorkflowRunner::removeWorkflow
4 WorkflowControlMultiplexor::removeWorkflowControl(1012)
4 AutoTrace : >> WorkflowControlComProxy::~WorkflowControlComProxy
4 AutoTrace : >> WorkflowControlComProxy::disconnectWorkflowContext
4 AutoTrace : >> WorkflowControlComProxy::getConnectionPoint
4 AutoTrace : << WorkflowControlComProxy::getConnectionPoint (1284 ms)
2 WorkflowControlComProxy::disconnectWorkflowContext: RPC call failed
4 AutoTrace : << WorkflowControlComProxy::disconnectWorkflowContext (3711 ms)
4 AutoTrace : << WorkflowControlComProxy::~WorkflowControlComProxy (6231 ms)
4 AutoTrace : >> WorkflowStoppersManager::remove
4 AutoTrace : << WorkflowStoppersManager::remove (1273 ms)
4 AutoTrace : << WorkflowRunner::removeWorkflow (11176 ms)


What we can see above from the log is that the XPath Query Activity becomes unresponsive and fails to send heartbeats and thus is considered "dead" and gets shut down. However further in the log we can see that because it is unresponsive, it does not answer to the remove request with the error "RPC call failed" and thus remains orphaned and the Runbook Server just restarts the whole Runbook and this will constantly happen until someone manually stops the Runbook from running.

So keep in mind when designing your Runbooks, that different Activities may have different limitations and should only be used in the intended way - and - test, test, test!

Just out of fun's sake, we can make this scenario work by redesigning our Runbook and changing the XPath Query Activity with another PowerShell Script (Run .NET Script Activity) which filters the big XML and manipulates the data or whatever we may need to do with it.


Have fun automating! 😉




Comments (0)

Skip to main content