~ Brian McDermott | Escalation Engineer
Hi everyone, Brian McDermott here. When we released the System Center Operations Manager 2007 R2 Authoring Resource Kit, one of the tools included was the Workflow Analyzer. Not only was this tool very useful in debugging your workflows whilst developing them, it was also very useful for us guys in support who had to then debug them once they hit a runtime issue somewhere in the wild.
Unfortunately, we often found situations where it wouldn’t work. It was only developed for servers running US-English locale which left a good number of machines out of its reach. We then found that in many larger customer environments there were too many workflows for it to handle and it failed in those circumstances too. As (bad) luck would have it, these sort of customers were often the ones who had the unusual workflow problems that were crying out for this kind of tool.
Recently, I was working on a very unusual workflow failure case and decided that the Workflow analyzer was the right tool for the job. Unfortunately the tool decided it was not the right tool for the job and crashed when we tried to use it. I wouldn’t normally spend time troubleshooting troubleshooting tools, but in this case I was all out of suitable options for furthering the investigation and it did seem like the best idea.
As standard OpsMgr tracing and testing had not helped us get to the bottom of the problem I decided we should manually enable workflow tracing to help progress things. Here’s the Workflow Analyzer crash message:
[EventType clr20r3, P1 wfanalyzer.exe, P2 6.0.4900.0, P3 4ad83063, P4 system.windows.forms, P5 184.108.40.206, P6 4f681deb, P7 1521, P8 18, P9 system.invalidoperationexception, P10 NIL.]
This indicated a problem within System.Windows.Forms (a UI part of the application) so I was confident a manual approach would work. But if we are going to do it manually we need to know how the Workflow Analyzer does it automatically and copy it, so what does it do?
First of all it puts an override in place for the rule/monitor you wish to trace. This override is called TraceEnabled, but unfortunately it isn’t exposed to us in the OpsMgr console so we have to a bit of manual override creation/editing to do. In order to make this easy for us we need to search on the rule/monitor we wish to trace in the OpsMgr console, ensuring we are choosing the correct target for the machine this is failing on, and then create an override for the object that sets the Enable property to true. This will have no effect on the workflow if it is already enabled but we are using this technique to help create our override in the MPs xml.
Once you have created the MP, export the MP and then open it in your favorite XML editor (Notepad will do). You will then see an override like this for your rule,
< RulePropertyOverride ID="OverrideForRuleMomUIGeneratedRule06afb11860a042f9b0f5c7f25205431eForContextMicrosoftWindowsComputerc5241357d8304af5b9bdc54795a4a8c9" Context="MicrosoftWindowsLibrary6172210!Microsoft.Windows.Computer" ContextInstance="82c098da-8958-25ac-47b9-468c1c815fb0" Enforced="false" Rule="MomUIGeneratedRule06afb11860a042f9b0f5c7f25205431e" Property="Enabled">
In order to turn this into a workflow trace enabling override you need to edit the property name from Enabled to TraceEnabled, everything else will stay the same.
<RulePropertyOverride ID="OverrideForRuleMomUIGeneratedRule06afb11860a042f9b0f5c7f25205431eForContextMicrosoftWindowsComputerc5241357d8304af5b9bdc54795a4a8c9" Context="MicrosoftWindowsLibrary6172210!Microsoft.Windows.Computer" ContextInstance="82c098da-8958-25ac-47b9-468c1c815fb0" Enforced="false" Rule="MomUIGeneratedRule06afb11860a042f9b0f5c7f25205431e" Property="TraceEnabled">
Save this MP, import it back into your OpsMgr environment and then wait for this to arrive at the agent you wish to trace. You can track this by watching for a 1210 event for that MP to be logged in the OpsMgr event log. Once it has arrived and you have seen the subsequent 1210 event indicating a new configuration is active, you are ready to move to the next stage.
Once the new config is loaded, OpsMgr will start sending out trace messages to anything who will listen. But we don’t have anyone listening yet as this is achieved by running the following command,
tracelogsm.exe -start "WorkflowTrace" -flag 0xFF -level 5 -ft 1 -rt -guid "#c85ab4ed-7f0f-42c7-8421-995da9810fdd" -b 1024 -f %windir%\temp\opsmgrtrace\WorkFlowTrace.etl
The location of tracelogsm.exe will vary by version and installation of OpsMgr, defaults are shown below
OpsMgr 2007 R2 – %\ProgramFiles%\System Center Operations Manager 2007\Tools\
OpsMgr 2012 Server – %\ProgramFiles%\System Center 2012\Operations Manager\Server\Tools\
OpsMgr 2012 Agent – %\ProgramFiles%\System Center 2012\Operations Manager\Agent\Tools\
OpsMgr 2012 R2 Server – %\ProgramFiles%\Microsoft System Center 2012 R2\Operations Manager\Server\Tools\
OpsMgr 2012 R2 Agent – %\ProgramFiles%\Microsoft Monitoring Agent\Agent\Tools
Also note that the output trace file location I chose above (%windir%\temp\opsmgrtrace\WorkFlowTrace.etl) is the location for the standard OpsMgr event trace logs on OpsMgr 2007 R2 in order to make the trace formatting more straightforward. If you are doing this on an OpsMgr 2012 or an OpsMgr 2012 R2 server then depending on the OS you may need to use the path %windir%\Logs\OpsMgrTrace instead.
That’s it. You now have workflow tracing in place and are ready to reproduce the problem and test your workflow. Each module will now be logging to the .etl file you specified above (e.g. %windir%\temp\opsmgrtrace\WorkFlowTrace.etl). Once you have repeated the tests you needed to do you can convert the etl to a readable log in the standard way by running formattracing.cmd from the same OpsMgr Tools directory you ran tracelogsm.exe from, and you will find the WorkFlowTrace.log gets created which hopefully has all the details you were after to help you solve your problem.
One final thing you need to do is to stop the tracing once you are done and this is achieved by running tracelogsm.exe -stop "WorkflowTrace" and then deleting the override from your MP.
Maybe one day this will prove a useful technique for you when troubleshooting a trickier than usual workflow problem.
Brian McDermott | Escalation Engineer | Microsoft GBS Management and Security Division
System Center All Up: http://blogs.technet.com/b/systemcenter/
System Center – Configuration Manager Support Team blog: http://blogs.technet.com/configurationmgr/
System Center – Data Protection Manager Team blog: http://blogs.technet.com/dpm/
System Center – Orchestrator Support Team blog: http://blogs.technet.com/b/orchestrator/
System Center – Operations Manager Team blog: http://blogs.technet.com/momteam/
System Center – Service Manager Team blog: http://blogs.technet.com/b/servicemanager
System Center – Virtual Machine Manager Team blog: http://blogs.technet.com/scvmm
The Forefront Endpoint Protection blog : http://blogs.technet.com/b/clientsecurity/
The Forefront Identity Manager blog : http://blogs.msdn.com/b/ms-identity-support/
The Forefront TMG blog: http://blogs.technet.com/b/isablog/
The Forefront UAG blog: http://blogs.technet.com/b/edgeaccessblog/