One of the most powerful features of the Windows Workflow Foundation is its ability
to automate state management of long running processes using a persistence service.
Microsoft supply a persistence service out of the box - the SqlWorkflowPersistenceService.
Unsurprisingly this persists state in SQL Server (or SQL Express).
Once you have a persistence service more possibilities open up: different applications
can progress the same workflow instance over time; multiple hosts can process workflows
in a scale out model. This second feature needs a little investigation - there is
a gotcha hiding there you need to be aware of.
The issue is how to stop more than one host picking up the same instance of a workflow
and processing it at the same time - imagine the workflow transfered $10,000,000 from
one account to another, you'd hardly want this happening twice. So if the possiblity
exists for multiple hosts to see the same persistence store, the persistence service
must be able to ensure only one host is executing a workflow at any one time.
The SqlWorkflowPersistenceService handles multiple
concurrent hosts by using the concept of workflow ownership - the guid of a host (created
randomly by the persistence service constructor) is stamped against a workflow that
it is actively executing (not in an idle or completed state). Now the question comes
"what if the host dies while executing a workflow?". This is what the ownership timeout
is for. You set the ownership timeout in the constructor of the SqlWorkflowPersistenceService.
SqlWorkflowPersistenceService sql
=
new SqlWorkflowPersistenceService(CONN,
true,
TimeSpan.FromSeconds(10),
TimeSpan.FromSeconds(5));>
workflowRuntime.AddService(sql);
Here the third parameter
specifies how long a host is allowed to run a workflow before persistence occurs.
If the host takes longer than this then it will get an error when it atempts to persist.
The fourth parameter is the polling interval for how often the persistence service
will check for expired timers.
Now the idea is that if a host dies then it's lock will
timeout and another host can pick up the work. There is a problem, however, in the
implementation. The persistence service only looks for expired ownership locks when
it first starts - not when it polls for expired timers. Therefore, for a workflow
instance whose host has died mid-processing, it will only recover if a new host instance
starts after the timeout has occurred.
So how can you make this more robust? Well we need a way
to explicitly load the workflows that have had their ownership expire - unfortunately
there is no exposed method to do this on the SqlWorkflowPersistenceService.
Instead we have to get all the workflows, catch the exception if we load a locked
one and unload any that aren't ready to run. Here is an example:
TimerCallback cb
= delegate
{
// get all
the persisted workflows
foreach (var item in sql.GetAllWorkflows())
{
try
{
// load the workflow - this will throw a WorkflowOwnershipException if
// the workflow is currently owned
WorkflowInstance inst = workflowRuntime.GetWorkflow(item.WorkflowInstanceId);
// Unload workflow if its still idle on
a timer
DateTime timerExpiry = (DateTime)item.NextTimerExpiration;
if (timerExpiry > DateTime.Now)
{
inst.Unload();
}
}
catch(WorkflowOwnershipException e)
{
// Loaded a workflow locked by another instance
}
}
};
Timer t = new Timer(cb, null,
0, 1000);>
So this code will
attempt to load workflow instances with expired locks every second. Is it a hack?
Yes. But without one of two things in the SqlWorkflowPersistenceService its
the sort of code you have to write to pick up unlocked workflow instances robustly.
The workflow team could:
-
Check for expired
ownership locks in the stored procedure that checks for timer expiration
-
Provide a method on
the persistence service that explicitly allows the loading of unlocked workflow instances
Maybe in the next
version :-)
>>