SharePoint workflows are really powerful – but debugging any workflow can be a very challenging exercise. You, obviously, have to drive through the process pushing an item one step to the next. That can be difficult if there are mandatory delays, retries, etc. Most of those things are just painful but generally speaking pretty solvable. However, there are some things in workflow that are really hard to get to. I ran into another situation with workflow that was hard to find.
When we were doing some stress testing we managed to break one of my workflows. After some careful work we realized that it happened anytime the workflow got two events at the same time. The second event would disappear. Poof. It just wouldn’t get handled at all. The nastiness of this is that if the second event was a task SharePoint would have locked the task and so it was no longer possible to modify the task – and because of this the workflow would never be able to be moved to completion.
I’m happy to say that the fix for this issue is in the August 2010 Cumulative update for Windows SharePoint Services 3.0. I tested it with some really abusive situations. In my validation test, I put the thread to sleep for 30 seconds after getting an event. I can say that the events are eventually delivered. However, I should caution that events aren’t always processed in the order that they came in. If I have three events, 1, 2, and 3 I’ve noticed that event 1 is processed then the workflow waits for the timer job and event 3 is processed and after the workflow goes back to sleep again event 2 is processed.
The net of this is that you should try to avoid scenarios where you have to get events in a prescribed order even after you’ve applied this fix. However, at least your workflows won’t break if two events come in at the same time – like for instance if you are asking for approvals from many people as was the case in our situation.