One of the things I like (mostly) about working with SharePoint is that I get the opportunity to learn stuff almost every day. I say mostly because although I love to learn, sometimes I have to learn by making mistakes – and that’s never fun.
About a month ago Dave Wollerman learned something about the way workflow’s worked and blogged about it. The post titled “Huge MOSS Workflow Issue… What is Microsoft Thinking!!!!” The post lays out how workflows based on the out of the box workflows – Approval is the one he calls out – lose their workflow history after 60 days. In his post he’s frustrated (as in fact I might be) because he expects that workflow history stays around as long as the item does. He mentions that there’s a performance reason for doing this but doesn’t dig behind the scenes.
In a recent conversation with some folks on the product team I got to mention Dave’s post and talk through it with them. Out of that conversation I was struck by a few things.
First, there is a real performance issue that they were trying to address. Because of the way that lists are implemented there’s a soft limit around 2000 items per “view” where the performance starts to decrease. It’s a long story about the database table that items are stored in, how SQL treats the query, etc. The short of which is that you need to be cognizant of how big lists get. Steve Peschka wrote a white paper on this “Working with large lists in Office SharePoint Server 2007”. It’s good reading if you want to know more about this. But what does this have to do with Workflow History?
Well, each workflow history entry is placed into a big list. In that list you have all of the workflow history items for all of the workflow instance for every workflow association that has that workflow history list associated with it. If you figure on five workflow history entries per workflow it’s easy to see that even a few hundred items with workflow can quickly cross that 2,000 item mark. As a result deleting items – making sure that the list doesn’t get really, really huge, is an important part of maintaining performance on the system.
Second, Dave’s thinking about the workflow history list as a permanent audit record. MOSS provides an audit mechanism – but the workflow history list isn’t it. I have to admit that I’ve fallen into this trap. I do have some workflows that I think of the workflow history list as a part of the auditing – so I need to go back and revisit these workflows. A better way to think about the workflow history list is a log. The log files that MOSS creates – the ULS Log – are automatically recycled. Based on the schedule you set new logs are created and eventually deleted. Workflow histories are designed to be like this.
Third, there’s a gap in the documentation. It’s not precisely clear how this automatic recycling of the list takes place. The object model reference lets you know that SPWorkflowAssociation.AutoCleanupDays exists, however, it doesn’t offer any additional information about how it works, why it exists or how it works. I’ve found only a few references to this property in the Blogosphere. I haven’t found an article or book that covers it (including my chapter in Real World SharePoint 2007 or articles.) That means that it’s not going to jump out at you – unless you go looking for it because you already know it’s there.
Fourth, and finally, I’ve not been able to find any place that it is documented how to set the AutoCleanupDays when the workflow is associated. However, a buddy on the program team showed me where this can be set so it will apply to every new association of the workflow – it won’t solve the built in workflows but it can help you with your own workflows. Note the addition of an node in the following sample Element Manifest Xml:
That’s pretty cool and not unlike the idea of feature stapling – where you can handle all of the new instances created even if you have to go back and update the existing associations.
[Update: 2007-10-16: It turns out that autocleanup doesn't actually clean out the workflow history. It does, however, remove the entry from the workflow instances that connects the workflow history to the item that the workflow ran on. It also turns out it deletes any tasks that were created by the workflow. The net effect of this is that the actual auditing is OK -- the user interface just doesn't look so good.]