One of the topics that came up recently was the idea of writing a Protocol Handler for SharePoint in C#. Unfortunately I don’t have much context on the item because it came through a series of steps. However, one of the things that struck me was that in many cases a protocol handler wasn’t needed. However, that requires a moment of explanation on what a protocol handler is.
In SharePoint search (and most of the Microsoft Search technologies) there are two main extensibility points. First, you have the IFilter. The IFilter is responsible for processing the contents of a file. So there’s an IFilter for PDF files and office files and so on. Second, is the protocol handler. The protocol handler’s responsibility is getting the content from the end point to the gatherer so that it can be handed off for the correct IFilter for processing. There are protocol handlers for file shares, web sites, etc.
Invariably folks come along with a desire to index content that isn’t in a repository available via an out of the box protocol handler. One of the obvious things to do about this is to write your own protocol handler to get to the content. While this may be obvious, it may not be the right answer. Protocol Handlers are multi-threaded and because of that they require a bit of care to write. As a general statement, they’re harder than most folks really want to deal with. So if writing the protocol handler isn’t always the right answer, then what is?
A few years ago I wrote an article for DevX.com titled “Using SharePoint Portal Server to Index Your Custom Application” In that article I show you how to quickly develop a web application to surface data from your custom applications, this same approach can be used for third party applications. The fact that it was written for SharePoint Portal Server 2003 shouldn’t scare you all of the same pieces work today.
The net is that all of the content is made accessible via a web interface that the search crawler is pointed to and so the content becomes available via HTTP — which SharePoint can index out of the box.
What are the limitations? Well, primarily, the limitation is that you can’t pick up access control to the content so it’s not the best fit for sensitive information — or said another way information that is secured to individual users. However, for most kinds of information that an organization might want to make available it’s quick and easy.
It should be noted that the BDC is another good way to reach into other custom applications — if the data is relational in nature. While the strategy shown in the article works well for documents and data, the BDC is more data focused.
Hopefully, you’ll save yourself some time on writing a protocol handler and have a chance to do something more fun. (Writing protocol handlers isn’t fun.)