Posts

forge

Why You May Not Need a Protocol Handler for SharePoint

One of the topics that came up recently was the idea of writing a Protocol Handler for SharePoint in C#. Unfortunately I don’t have much context on the item because it came through a series of steps. However, one of the things that struck me was that in many cases a protocol handler wasn’t needed. However, that requires a moment of explanation on what a protocol handler is.

In SharePoint search (and most of the Microsoft Search technologies) there are two main extensibility points. First, you have the IFilter. The IFilter is responsible for processing the contents of a file. So there’s an IFilter for PDF files and office files and so on. Second, is the protocol handler. The protocol handler’s responsibility is getting the content from the end point to the gatherer so that it can be handed off for the correct IFilter for processing. There are protocol handlers for file shares, web sites, etc.

Invariably folks come along with a desire to index content that isn’t in a repository available via an out of the box protocol handler. One of the obvious things to do about this is to write your own protocol handler to get to the content. While this may be obvious, it may not be the right answer. Protocol Handlers are multi-threaded and because of that they require a bit of care to write. As a general statement, they’re harder than most folks really want to deal with. So if writing the protocol handler isn’t always the right answer, then what is?

A few years ago I wrote an article for DevX.com titled “Using SharePoint Portal Server to Index Your Custom Application” In that article I show you how to quickly develop a web application to surface data from your custom applications, this same approach can be used for third party applications. The fact that it was written for SharePoint Portal Server 2003 shouldn’t scare you all of the same pieces work today.

The net is that all of the content is made accessible via a web interface that the search crawler is pointed to and so the content becomes available via HTTP — which SharePoint can index out of the box.

What are the limitations? Well, primarily, the limitation is that you can’t pick up access control to the content so it’s not the best fit for sensitive information — or said another way information that is secured to individual users. However, for most kinds of information that an organization might want to make available it’s quick and easy.

It should be noted that the BDC is another good way to reach into other custom applications — if the data is relational in nature. While the strategy shown in the article works well for documents and data, the BDC is more data focused.

Hopefully, you’ll save yourself some time on writing a protocol handler and have a chance to do something more fun. (Writing protocol handlers isn’t fun.)

forge

SharePoint Search Operational Role (Job Responsibilities)

I was recently asked about what sort of things should be in a job description for a person who manages search in a SharePoint environment. I say all of the time that search isn’t a product — it’s a process. What I mean by that is the product will only get you so far. A human will have to be involved to make the tool really valuable. Here’s what I sent to the client as a final set of activities/skills/responsibilities:

  • Review and Resolve Crawl Logs for Errors and Warning
  • Review Crawl Logs for Performance and Heartbeat
  • Periodically review performance data for the search indexer to identify impending performance issues
  • Manage the indexing process
    • Review requests for new content sources
    • Develop, monitor, and tune content crawling schedules
    • Implement appropriate crawler impact rules
    • Implement and maintain crawl rules to control what content is in the index
    • Work with network operations to control permissions for the crawler account to manage what is in the index
    • Manage search scopes
  • Review usage reports and work with the organization to improve relevancy by leveraging out of the box tuning parameters including:
    • Changes to the noise words files
    • Changes to the thesaurus file
    • Changes to the authoritative sites list
    • Changes to keywords and best bets

Thanks to Spencer Harbar and Ben Curry for their contributions to this.

forge

Windows SharePoint Services Search – No Results and Errors

I had a client who has WSS.  They weren’t seeing any search results.  There were also some errors being logged on the server the most prevalent of which was:

 

Event Type:        Warning
Event Source:    Windows SharePoint Services 3 Search
Event Category:                Gatherer
Event ID:              2436
Date:                     11/5/2007
Time:                     10:00:03 AM
User:                     N/A
Computer:          SERVER
Description:
The start address <sts3://server/contentdbid={70792d37-74fc-430d-8939-d55afebdb795}> cannot be crawled.

Context: Application ‘Search index file on the search server’, Catalog ‘Search’

Details:
The object was not found.   (0x80041201)

 

The reason for the error … the server couldn’t reach the web application on its default URL.  Once I changed the alternate access mappings so that the default URL was reachable from the server and issued a command to start a full search (see below), it finally indexed things correctly.  The command to force a full crawl is:

STSADM –o spsearch –action fullcrawlstart

forge

Mondosoft SharePoint Search Workshops — Including Seattle

As I’ve mentioned on my blog before, I’ve been doing some of the Mondosoft SharePoint Search workshops where we demonstrate MOSS search, explain how it works, and show off some of the Mondosoft Ontolica product — which makes SharePoint search even better.  In a last minute change, I agreed to do the Seattle event next week on the 16th.   That means I’ll be doing the events in these cities:

  • Seattle, October 16th
  • Chicago, October 24th
  • Indianapolis, October 25th

Go register at www.ontolica.com so you can come introduce yourself to me at the event.

forge

SharePoint Search Workshops from Mondosoft

I have the great pleasure of having been invited to help Mondosoft deliver its SharePoint Search Workshops in some cities across the US.  In fact, both Bob Mixon and I will be doing select events for Mondosoft.  We haven’t finalized which events Bob or I will be doing as of just yet, however, I can say that I’ll be attending (not presenting) in Chicago this Thursday August 16th.  It’s my opportunity to work with Mondosoft’s presenters to learn the content they’ve put together in order to help registrants better understand how Microsoft Office SharePoint Server (MOSS) Search works.

In my opinion, search is one of the misunderstood features of MOSS.  There’s so much power in the engine and relatively speaking very little of that power makes it to the surface in the user interface.  That is why these events are compelling.   You can learn more about how SharePoint works and what the out of box user interface doesn’t surface.

If you want to see if there’s an event coming to a city near you – or if you want to register for one of the events go to http://www.ontolica.com/Services/workshops.aspx

I look forward to meeting you at one of these events soon.

forge

WebCast: Using SharePoint Search to Find Information in Your Enterprise

I’ve been giving a presentation live for 18 months or so.  That presentation explains how SharePoint Portal Search works, shows you how to setup a search for content on your network, how to customize search results, and even shows you how to use SharePoint to search information in your custom applications.
There’s a lot of power in SharePoint Portal Server Search that most folks don’t leverage.   Although the web cast focuses on SharePoint Portal Server 2003 you’ll find that many of the same concepts apply directly to Microsoft Office SharePoint Server 2007.
Since I can only be in one place at a time, I’m making the presentation available as a web cast. For now you can get the web cast here (link removed).  If you are going to link to the content please link to this blog post as I may move the web cast in the future.  (If I do I’ll update this post to reflect the new location.)
I hope you enjoy the presentation and I’d appreciate your feedback.
forge

SharePoint Gets Search Analytics

Microsoft’s SharePoint Portal Server 2003 was sold into a large number of organizations based solely on the strength of the search tool. Organizations hungered for a way to find the data they had generated.

Structured data such as invoices, products, and shipments may have been easy to find in the applications designed for that data, but the growing mountain of documents seemed to make the unstructured information that you were looking for perpetually out of reach.

Search in SharePoint made significant progress in its ability to connect users with the unstructured information that they were seeking. But the effectiveness of searches depended upon the skill of the searcher and the alignment of the terms that the searcher used to the terms in the documents. The world of search analytics was still very foreign to most organizations. Thankfully, the next version of SharePoint Search with its focus on relevancy will also include reports that allow you to see the effectiveness of the searches users are executing.

Microsoft Office SharePoint Server 2007 is a part of the Office System and is set to debut sometime in late 2006 or early 2007. Microsoft Office SharePoint Server 2007 includes numerous enhancements designed to improve search relevance, Internet usage, content management scenarios, and many other features which were shared this week at the SharePoint Conference in Bellevue, Wash., this week.

In this article you’ll learn about the basics of search analytics, what you can do today to improve your search results, and what to expect in Microsoft Office SharePoint Server 2007.

http://www.intranetjournal.com/articles/200605/ij_05_18_06a.html

forge

Harnessing Properties in SharePoint Search

Most users of SharePoint Portal Server rapidly become enamored with the ability to add new fields (containing meta data) to documents in the document library. All of the sudden it becomes possible to associate information to a file beyond the file name that we’ve been limited to since the beginning of the computing era.

Few users, however, have the opportunity to understand how this meta data is used by SharePoint for searching. This leads to problems when users decide that it’s necessary to use SharePoint Portal Server Search to search on information contained in a field that they have added. In this article you’ll learn how SharePoint uses document library fields to create properties that are searchable and how to enable searching on those properties.

Article reposted here: Retro: Harness Properties in SharePoint Search

forge

No Searching SharePoint Portal Pages — by default

Something I ran across the other day is that the default content index for Portal Content has an exclusion in it so that it won’t search the actual pages of the Portal.  If you have content editor web parts, or other content that you want to make sure that SharePoint does index, you’ll want to go into SharePoint and remove the exclusion for pages.  You can follow these steps to remove the exclusion

  1. Open the SharePoint Portal Server site
  2. Click on Site Settings
  3. Click Configure search and indexing
  4. Click Manage Content Indexes
  5. Hover over Portal Content, drop down the menu arrow that appears on the right, and select Edit.
  6. Click Manage rules to exclude and include content
  7. Find the entry in the list which is http://yourserver/*.aspx, hover over it, drop down the menu arrow that appears on the right, and select Delete.

The next time the portal content index is refreshed you’ll start getting content from the web part pages as well.

forge

REST for Search.aspx

SharePoint’s support for REST (query string parameters) on the search.aspx (Portal Server Search/Search Results Page) is very powerful.  However, once you get past the basics of passing it a search string (k … short for keywords) the documentation gets a bit fuzzy.  Here’s one thing that I was able to do with search.aspx…

First, a few parameters…

Parameter Description
k Keywords – General Search
tp Type of document (I’ve found Person and Document very handy.)
s Scope — Think Search Scopes from the search drop-down list
pt Property — Any of the custom properties that SharePoint indexes. IFilters put these in.
d Date — apparently required when you’re searching for properties

So the next bit is how to encode the properties. Here are a few ASCII/Hex Codes you may need

Code Punctuation
%3a : (Colon)
%2e . (Period)
%23 # (Hash, Pound, etc.)
%2c , (Comma)

So if you want to search for the word ‘spacey’ in any text in a document in a search scope named fuzzy you would have a URL like: http://server/search.aspx?tp=Document&s=fuzzy&k=spacey

It gets a tad bit more complicated when you want to search only in a property not in the full text. First, you have to find the property you want in Manage Properties, and then you have to setup the search … When you add the property to the search you have to add a single character specifier indicating the type of the property before you start it’s name — mostly you’re dealing with Strings so ‘S’ is appropriate. Then you have to put commas after the property name the keyword Contains and another comma — all encoded of course. Next, once I search for a property I have to add a date specifier to the query. The title property is really urn:schemas.microsoft.com:fulltextquery.displaytitle. So if I wanted to find the same thing in only the title property I’d do this: http://server/search.aspx?tp=Document&s=fuzzy&pt=Surn%3aschemas%2emicrosoft%2ecom%3afulltextqueryinfo%3adisplaytitle%2cContains%2cSpacey&2cAnd%2c&d=All

Clear as mud right? It’s not that bad, basically describing the property and url encoding all of the special characters makes it look ugly. If it weren’t encoded it would look like this..http://server/scope.aspx?tp=Document&s=fuzzy&pt=Surn:schemas.microsoft.com:fulltextqueryinfo:displaytitle,Contains,Spacey,And,&d=All still not pretty but certainly much more readable.

I hope this helps when you want to be able to create links to property searches.