I’ve been doing a lot of troubleshooting of Workflows in Service Manager lately as I have been working on some of these extension projects. It’s not exactly trivial to figure out what is going wrong with a workflow, so I wanted to write a blog post explaining the process. I’ll also point out a few common causes of workflow failures and the resolutions.
The easiest way to figure out the status of a workflow is to check the Service Manager console. Go to the Administration workspace (you must be a Service Manager administrator to do this), expand the Workflows node and select the Status view.
You’ll see a list of all of the workflows in the system, including the workflow extensions that you may have made with the authoring console and imported into your environment.
Here you can get a good idea of which Workflows are currently running in your environment, enabled status, when it was created and when it was last modified.
If you select a workflow item in the list, down at the bottom you will see the history of all the times that workflow has run. There are two tabs – one which shows only Failures and another which shows All.
In the example above, you can see that my workflow has failed twice and it shows the start and end time. I can also see a few different actions:
- View log: This will open a dialog which will show you the log output from that execution of the workflow. More on that in a minute.
- View related object: This will open the form for the object that the workflow is running for. Typically this would be a change request, incident, computer, etc.
- Retry: Allows you to resubmit the workflow with the exact same configuration. Be careful with this though as in some cases it may not be appropriate to rerun a workflow given your business logic. For example, consider this timeline:
- a workflow failed to run to process an incident –let’s say change the urgency to high
- in between that failure and now some other workflow has run or a user has modified the incident manually which has changed the urgency to medium
- At this point – what is the right urgency value – high or medium? If you rerun the workflow it will change the value to high. If you don’t it will leave it at the ‘high’ value applied by the other workflow/user.
- Make sure you understand what it is that your workflow will actually do before you choose to rerun the workflow.
- Ignore: Dismiss the failure. You will not be able to retry the workflow after it has been marked ignore.
Now, let’s say you unfortunately have a failure. Click on the View log link above for the failure you are interested in looking at further and you’ll see a dialog like this:
It’s a little hard to notice, but if you expand the Failure details section at the bottom you can see more information about the failure:
OK, so that is not super helpful, but in some cases you’ll get better error information here.
This particular error (-2130771925 ; 0x80FF002B) in my experience occurs because the following workflow support assemblies are not copied into the Service Manager folder (%ProgramFiles%\Microsoft System Center\Service Manager 2010):
These assemblies ship with the authoring console and must be copied manually to the Service Manager folder on the management server. Without them many of the workflows you design in the Service Manager authoring console will not work.
So, the dialog you see above is what you will see when the workflow has failed to run completely. In this case the workflow failed because the supporting assemblies were not present.
If your workflow does happen to run successfully it may not necessarily do what you want it to do. Take this example:
Here, my workflow is running successfully (meaning that it at least completes), but I am not getting the results I expected. If I take a look at the log in this case I can get a better idea of what is going on.
So – this shows me that the workflow started, ran my powerShellScript1 workflow activity and then finished. Still not enough info!
Time to go digging in the database!
select SubmittedBy, RunningAs, Status, convert(xml,Output), ErrorCode,ErrorMessage, TimeScheduled, TimeStarted, TimeFinished
order by TimeFinished desc
Find the error you are interested in by filtering the data using WHERE and ORDER BY clauses on the table. For example, filter it down using time range so there isn’t so much data returned in the results. Remember – all time is stored in the database in GMT/UTC.
Then click on the Output link in the results:
You’ll see a bunch of XML that looks something like this:
It looks like a bunch of complicated data, but it can be really invaluable for figuring out where in the process a given workflow is failing. For example, this particular error pops right out to me:
In this case, I had provided a computer name for the parameter on the workflow activity which couldn’t be resolved (my error). This piece of information gave me the information I needed to figure out what was going wrong in my workflow.
Especially for workflows which use the PowerShell script activity, I recommend writing out to the Standard Out (write-host cmdlet). Anything written to Standard Out will be captured and stored in the job status table. This can be invaluable for troubleshooting workflows.
Here is an example of the output written by one of my PowerShell activities:
Another way to check on workflows is to check the Operations Manager (yes, it’s called Operations Manager – it’s a long story) event log on the management server. In this case, I was getting events with Event ID 4000 from the HealthService source.
Hope that helps!
Follow me on Twitter! (http://www.twitter.com/radtravis)