It’s familiar and convenient to browse for files – until it isn’t. Our human brain has limits; even with the familiarity of having created the files ourselves, our ability to remember, recognize, and retrieve the right file at the right time is limited. So how do we know when we should stop trying to browse for files and should lean on search? Perhaps more importantly, how are our small decisions pushing us towards a browsability approach when we need to be more focused on how search can help us find the documents we need?
The Magic Number 7 +/- 2
It was 1956, and George Miller at Harvard had done some research about our working memory. He concluded that we had the ability to maintain in our heads about seven items. More specifically, he identified seven plus or minus two. What this means for the way we process lists is that we’re capable of processing about seven items in a list. Later replication of Miller’s findings in European languages seemed to a suggest a slightly lower number (five), which was later attributed to the fact that we have about a two second audio buffer in our heads, and European languages take slightly longer to convey the same information. This is close to Gary Klein’s later work as documented in Sources of Power, where he explains that we can simulate a system with three factors and six states.
The short version is that the human brain seems effective at processing small lists but is inefficient at processing large lists. In fact, what appears to happen is that when we encounter a large list, we start to scan subsets of the list to look for the items that we’re interested in.
As the world wide web was growing, there was a great deal of interest in Hick’s Law and the need for everything to be accessible within three clicks. Hick’s Law says, in brief, that in well-ordered lists, one large list is more efficient than two smaller lists. Many people used this to justify very long lists of menu items and countless links on navigation pages.
However, the important caveat to Hick’s Law is that the listing must be well-ordered. The argument has been made that an alphabetical listing is a well-ordered list, because it follows a very predictable pattern. In that sense, it’s true; but there’s a larger problem, and that’s the problem of synonyms. (See Pervasive Information Architecture for more about Hick’s Law.)
The reality of our mental processing is that we think in concepts and then later apply words to those concepts. Is it a shirt or a blouse? Conceptually it’s a “top” or covering for the top of the body. It’s only when we begin to think about the context we’re using it in that we can find the “right” word for the situation.
The problem with ordered lists is that they must be contextless – rather, it’s not possible to perfectly correlate the context of the designer of the list and the consumer of the list. The result is that we need to consider synonyms when searching a list. Shirt, top, and blouse occur in radically different positions in a single list. This invalidates the work that leads to Hick’s Law and put us at the mercy of the anxiety created by the paradox of choice.
The Paradox of Choice
More choices seem like a good thing. However, as Barry Schwartz elegantly explains in The Paradox of Choice, this isn’t always the case. In fact, the research seems to indicate that there is an anxiety created when there are too many choices and no clear directions to move in. Certainly, our goal in creating systems isn’t to create anxiety in our users, but that’s what we do when we create large lists.
So, while Hick’s law implies that we should have larger lists, the paradox of choice pushes back on this assertion with anxiety. Collectively, this effectively invalidates Hick’s argument in the context of unstructured data. As a result, we’re back to finding ways to shrink our lists towards the kinds of numbers that our brains can handle well.
Folders and Directories
In addition to the human factors, there are technical reasons to restrict any given file directory to a few thousand entries. Whether the technology is traditional file systems or content management systems based on relational database engines, more than a few thousand files in a single directory (or query) can be problematic. For performance reasons, it’s best to keep entries at a few thousand entries or fewer – even if the users never directly access the files by browsing.
As a result, in addition to the natural foldering that might be used to separate files with different metadata, there are often time-based foldering strategies that keep individual directories to only a few thousand files.
Collectively, this makes finding files by browsing harder – even if the human factors limiting the number of items is ignored.
The Browsability Number
There is no one number between the magic number seven that Miller proposed in 1956 and the technical limitations around thousands of files that can be browsed in a single directory. As the number of files increases, the anxiety and frustration increase. The subset scanning strategy that we as humans use tends to break down by about 100 files, so any situation where we can’t reach fewer than 100 files is unlikely to be easily browsed regardless of any file naming conventions or other organizational techniques that may be in use.
The Search Solution
The solution to the problem is to switch from a browsability strategy to a search focused strategy. Browsability-based solutions are focused on single-dimensional naming strategies and large directories. A better strategy is to develop a rich search strategy that leverages metadata and search refinement to create ways for people to leverage technology rather than attempting to manage the identification process internally.
Search refinement allows for initial criteria to be specified and then further refined by selecting metadata in other columns or dimensions. With a well-built taxonomy, the process of searching for documents with refiners is quick, and because of the limited number of options, it is not anxiety-producing. One key to this is the fact that search refiners don’t show every possible value but instead show only those values that exist in the results that are already displayed – this focuses the searcher into only those options that are relevant to the current context.
Browsing works when there are only a few files, but at the scale of thousands, tens or hundreds of thousands, or more, search is the only way to go – and it’s focused on metadata and refiners.
When we work on file naming conventions, we’re necessarily working from the perspective of browsability, even when it may be broken. (See File Naming Conventions Miss the Point for more.)