SharePoint Workflow Architecture – Part 2

This blog post is a contribution from Andy Li, an Escalation Engineer with the SharePoint Developer Support team.

Andy is one of the Escalation Engineer with the SharePoint Developer Support team. The series of posts on Workflows are his contribution for the community to better understand the internals of workflow runtime and how it interacts with SharePoint.

Refer previous part of this series or jump to the next and the last part of this series.

SharePoint Workflow Architecture – Part 1

SharePoint Workflow Architecture – Part 3

Workflow Event Processing Pipeline

Almost every workflow requires some sort of user interaction.  For example, your workflow may create a couple of tasks and assign them to a group of users for approval.  When the users got their tasks they will go to the site and submit their feedback and their feedback will be sent back to the workflow for further processing.  There are a lot of actions going on behind the scene after the user submits the task.  One thing we mentioned earlier is that when the user submits the task, an event receiver will respond to that and try to deliver the task changes to the workflow.  We typically refer to this process as event delivery, in workflow runtime this is called Data Exchange.  They all mean the same thing.  In this section, we’ll talk about all the macro-actions that take place during this process.

Workflow Data Exchange Service

Workflow Host processes can communicate with workflows by exchanging data through custom local communication services.  These local communication services implement user-defined interfaces that define methods and events that will be passed between the workflow and the host process.  Events are used for sending data to a workflow.  Whereas, methods are used by the workflows to send data to the host application.

The following diagram shows how a local communication service communicates with its host application.  You can read Using Local Services in Workflows for more information.

image

For example, SharePoint defined ITaskService to handle exchange task information between SharePoint and the workflow that created the tasks.  ITaskService.CreateTask is used by workflow instance to create the task item on the tasks list.  ITaskService.OnTaskChanged event is used by SharePoint (the host process) to send an “event” to the workflow instance.

A service class that implements these interfaces is required to actually accomplish the data exchange.  For example, Microsoft.SharePoint.Workflow.SPWinOeTaskServices implements the ITaskService interface.  The service class has the actual implementation for creating the task in SharePoint list as well as raising the event when the event comes back from the host process.

The last thing is to hook up the data exchange service to your workflow, this is how HandleExternalEventActivity and CallExternalMethodActivity come into the picture.  The SharePoint CreateTask activity is essentially a CallExternalMethodActivity that binds to ITaskService.CreateTask.

ITaskService

Take a look at the ITaskService definition in ILSpy.

image

Now, let’s explain how correlation works.  Simply put, correlation parameters are used by workflow runtime to find the correct destination for the event it needs to deliver to.  You may remember that every task activity in SharePoint requires a TaskId parameter (see the screenshot above).  Notice that the CreateTask method has a “CorrelationInitialization”, this is the place you will create a new TaskId and this value will be used for the rest of the activities on the same task.  For example, OnTaskChanged activity.  Later on, when a user submits a change on the task the workflow runtime uses the task to find the OnTaskChanged activity where that event should go to (Remember there can be multiple workflow instances that may have multiple OnTaskChanged activities in the system).

How does correlation work?

CorrelationParameterAttribute defines the identifier for the data exchange conversation.  Each method or event on the interface is then declared with a formal parameter of that name, for example, taskid, as shown in the ITaskService interface example.  You can also use other attributes to describe more complex correlation mapping.

Any operation, method or event that starts a new conversation must be attributed with the CorrelationInitializerAttribute.  For example, ITaskService.CreateTask and ITaskService.CreateTaskWithContentType are the start of a conversation.  When there is a call to a method that has CorrelationInitializerAttribute, the service class knows that a new conversation is being initialized with this call.  The workflow conversation lifetime is dictated by the lifetime of the correlation reference.

The following screenshot shows properties of a CreateTask activity in Visual Studio.  Notice that we bind the TaskId property to a field in the class called Task1_TaskId1.  In MethodInvoking (createTask1_Invoking()) we initializes the TaskId with a new GUID.

image

Do not confuse correlation parameter with “CorrelationToken”.  In the above screenshot, notice that the CorrelationToken has a value of “taskToken”.  And the OnTaskChanged activity on the same workflow (not shown) uses the same correlation token.  This means that they share the same conversation.

Once the “taskId” in initialized in either CreateTask or CreateTaskWithContentType activity, it will be shared with all the following task activities with the same correlationToken so that they know which task they will be working on.  Then when it comes to OnTaskChanged activity which is listening on ITaskService.OnTaskChanged event, it registers an event sink with the workflow runtime.  Later on, when workflow runtime receives an OnTaskChanged event from the host application (meaning SharePoint process, w3wp or owstimer), it checks taskId value on EventArgs then, route the event to the OnTaskChanged activity that has the same TaskId correlation.

High Level Event Processing

Most SharePoint workflows are event driven workflows.  When workflow reaches certain Activities that requires input from outside, workflow runtime will call SharePoint persistence service to save the instance data into SharePoint content database.  The following diagram describes the high level data flow about how a SharePoint workflow responds to the external event.

image

1. Event receiver responds to user’s action.

As we mentioned earlier, there are several workflow event receivers that respond to user actions.  SPWinOEItemEventReceiver is the one that responds to task and list item event, such as ItemUpdated and ItemDeleted.  These events contains the following information:

- The task or list item that is associated with the event.

- BeforeProperties and AfterProperties on the changed task or list item.

- Event type, this can be ItemAdded, ItemUpdated, ItemDeleted or any kind of events that SharePoint event receiver handles.

The event receiver sends the event data to SPWorkflowManager.

2. En-queuing event

SPWorkflowManager will try to deliver the event to the corresponding workflow instance (Each task has a field called “WorkflowInstanceID”, that’s how SharePoint knows the workflow instance that it’ll try to deliver the event to).

SPWorkflowManager queues the event in content database (ScheduleWorkItem table) as a WorkItem before starting to deliver the event.  This is due to the fact that the event delivery process is a long running process and can fail.  So queuing the event is simply for fail-over purpose.  If it fails to deliver the event, we can always pick up the event later to try to re-deliver it.  We’ll explain how to read this database to troubleshoot issues with delivering workflow events.

3. Delivering event

SPWorkflowManager continues the event delivery by sending the event to SharePoint workflow runtime (SPWinOeHostService).  The host service is responsible for loading the workflow instance into runtime and raising the event to the workflow instance.  At this point, the MethodInvoked method on the OnTaskChanged activity will be called.

4. Dequeuing event

If the workflow instance successfully processes the event, we’ll then remove the event from the ScheduleWorkItems table and this marks the ending of the event pipeline.

Event Pipeline

To better understand how these components work together, let’s take a look at a typical scenario how SharePoint handles task related activities.  The process starts when workflow executes CreateTask activity.  It ends when workflow completes CompleteTask activity.

image

1. The CreateTask activity inherits from CallExternalMethodActivity.   This activity is where the data exchange conversation between the workflow instance and SharePoint begins.  TaskId is initialized at this point.

2. Workflow runtime calls the SPWinOeTaskService.CreateTask to create the actual task item.  Remember that an “in-memory” object SPListItem is created for the task and it hasn’t been committed to database yet.  The actual commitment will be done by WorkBatch service right before the workflow is persisted.

3. The next activity on the workflow is OnTaskChanged activity.  Workflow runtime will call Subscription Service, which will setup an event receiver for handling the ItemUpdated event on the task we just created.

4. An event sink is setup with the workflow runtime to respond to the OnTaskChanged event.

5. Now there’s no more work for the workflow instance to do because it’s waiting for the task to be submitted by the user.  Persistence service is called to save the workflow instance to the content database.  Remember that the “in-memory” SPListItem for the task is also being committed at this moment.

6. The user submits the task change through the Task Form.  The Task form is essentially an ASPX page which calls OM to update the task item.  The task form calls SPWorkflowTask.AlterTask() API to commit the changes on the task item.  This fires off the ItemUpdated event on the event receiver that was registered earlier.

NOTE: You may notice that each workflow task item has a special field called “WorkflowVersion”. The value on this column is set to a value greater than 1 (meaning the task is locked) by SPWorkflowTask.AlterTask() and it indicates that an update has occurred on the task and requires further action by the corresponding workflow instance. This also means the task will remain as “locked” until the change has been processed by the workflow instance.

Question: If I call AlterTask() API multiple times, I receive an exception saying that the task is currently locked – why?

Ansswer: AlterTask API internally calls SPListItem.Update().  If it determines this task belongs to a running workflow instance, it’ll set the “WorkflowVersion” column to a value greater than ‘1’.  Now, if you are trying to call SPListItem.Update() again, it will check the “WorkflowVersion” column.  If it’s greater than ‘1’, we stop the update and throw this exception.  By this way, we prevent any change on the item before workflow runtime processes the OnTaskChanged event.  Only after the OnTaskChanged activity has been processed by the workflow instance, the “WorkflowVersion” on the item will be reset back to “1”, which means the task is “unlocked”.

7. Now, let’s continue.  The event receiver will respond to the SPListItem.Update() and try to deliver the event to the workflow runtime by calling SPWorkflowManager.RunWorkflow.

8. SPWorkflowManager generates a WorkItem and puts it into content database (enqueuing, ScheduledWorkItems table).  This WorkItem represents the pending work on task’s ItemUpdated event before the workflow instance processes the event we’ll keep a record in the database for fail-over purpose.  If for any reason the event cannot be delivered, the workflow timer job can pick up the WorkItem from the database and continue processing it.

9. SPWorkflowManager continues to deliver the event to SharePoint workflow runtime (SPWinOeHostService).  SPWinOeHostService examines the event and extracts two pieces of data from the event: WorkflowInstanceID and TaskID.  It loads the workflow instance into workflow runtime.  And then raises the OnTaskChanged event on the workflow instance.  Your custom code on OnTaskChanged.OnInvoked() on the event is called at this time.

Question: Where is TaskID stored?

Answer: Every workflow task has a column called “GUID”, that’s the TaskID.

10. After workflow runtime finishes executing OnTaskChanged activity, it removes the WorkItem from database.  The “WorkflowVersion” column on the task will be updated to “1” (unlocked) and finally the event receiver is deleted from the task list.

What about Workflow Timer Jobs?

We talked about that when SPWorkflowManager tries to deliver the event to host service (SPWinOEHostService), it needs to check several server conditions.  One of the condition is to make sure the workflow is not currently locked or running anywhere else.  A SharePoint farm environment can have more than one web frontend servers.  Any of these servers can host the workflow runtime.  And we need to make sure that the workflow instance is processed only by one workflow runtime at a given time.  The mechanism SharePoint uses is to set a flag in database to indicate the “lock” state, which makes it easy for SPWorkflowManager to determine if the workflow is locked.  If the workflow is locked, SPWorkflowManager will put the WorkItem into the database queue and timer job will process it asynchronously.

Workflow timer job is responsible to process the queued WorkItems.  In that case, timer service will be the host for the workflow runtime.  There are three timer jobs related to workflow and they have different jobs.  The table below lists their main functions.

Job Description
SPWorkflowFailOverJobDefinition A workflow case fail for multiple reasons.  If it fails half way through while the workflow will remain as locked and it cannot be re-started.  The fail-over timer job is to unlock those workflows that failed unexpectedly so that they can continue running.
SPWorkflowJobDefinition Process WorkItems in ScheduleWorkItem queue.  w3wp.exe process always tries to deliver event to workflow in the first place.  But if the workflow is locked by another process, it puts the event in the database queue.  This timer job will process these events every 5 minutes by default.
SPWorkflowAutoCleanJobDefinition Cleanup old workflow instances in database.  By default, this job will clean up the finished workflows that were created 60 days ago.  You can modify the default value by changing SPWorkflowAssociation.AutoCleanupDays property.

Every 5 minutes, SPWorkflowJobDefinition wakes up and picks up all the WorkItems from the database.  Bascially, the WorkItem records in the database contains all pending work that needs to be done.  Such as workflow instance ID, task item ID, event type etc.,  For each of the WorkItems, the job re-constructs the SPWorkflowEvents object from the WorkItem records.  It calls SPWorkflowManager.RunWorkflow() and passes the event to the appropriate workflow instance.

In the next part of this post, we’ll discuss Troubleshooting Workflow Issues.