File Naming Conventions Miss the Point

I use file naming conventions. I recommend them to clients. However, they’re not the point. They’re a means to an end when it comes to managing files. They’re a way to apply metadata to the file name in a repeatable way. However, when they become more important than appropriate focus on developing an information architecture that works, we’ve missed the point.

Metadata Less Systems

Much of the focus on file naming conventions is the result of the fact that most of the systems that people use only allow for one piece of user-entered metadata – and that is the file name. The advice in this scenario is to create a delimited approach to embedding multiple pieces of metadata into the file name so you, as the human, can parse it out.

While the strategies vary widely – and often hinder findability – they fundamentally consist of taking components of the name and embedding the metadata. For instance, my invoice file naming convention is YYMMDD-INV#-Client-Project. It’s designed to separate out four pieces of metadata in the name of the file: the date of the invoice, the invoice number, the client, and the project.

I could easily create separate folders for each client and project and file the invoices in those folders with only the date and invoice. When I do this, I’m moving the encoding of the metadata into the hierarchy and out of the file name. I choose not to do this because invoices for all customers are processed in batches. To file invoices in individual client and project folders would require more work to navigate to the client and project folders and would provide little (if any) value.

In a system that doesn’t inherently support additional metadata for files – like the file systems on our computers – the right answer is to find an information architecture that supports the processes that you need to use and involves some sort of structured response to either the file naming or the placement of the files – or both.

Sidebar: Dates in Names

Every system keeps intrinsic metadata about files in terms of its creation date and its modification date. In most systems, particularly file systems, it’s sortable. This leads to the question about why we should put the date in the filename. The answer is that, because these intrinsic dates aren’t settable, we can’t establish the date for the file – and in rare cases, this can be problematic. So, while in most cases the actual document date isn’t sufficient to warrant inclusion in the file name, because of their accounting importance, it seemed like the right answer.

Enterprise Sync and Storage

It’s been years now since the war between file-system-based approaches to content storage and more traditional systems with metadata, which duked it out in the market to see who would win. It rapidly became apparent that file-system-based file storage was going to win by sheer volume. As a result, the metadata-based content management systems retreated a bit, and the analyst companies began talking about enterprise content management as a part of a larger conversation about enterprise file synchronization and storage.

For most files, the effort to enter metadata wasn’t something the users were willing to do, and as a result, everyone gave up. Instead, they focused on the high importance and high value files and placed them into content management systems with metadata, ignoring the lack of metadata on most files.

Field Stacking

If we go back to the origin of computers, we saw techniques that squished multiple types of data into fewer, smaller fields. Back then, the reasoning may have been to avoid having to change a master record definition, concern about storage space, or a myriad of outdated reasons that led to smashing things together. What we learned as a result of this effort is that delimiting the data once it was smashed together was difficult.

Conceptually, parsing data out is simple. Look for the comma, semicolon, or other delimiter, and you’re done. In practice, these delimiter characters occur naturally in the data, so you must disambiguate between the intentional single quote and the attempt to terminate the string. We recognize that there are times when field stacking must still be done to work around limitations; however, it’s generally discouraged where possible because of the challenges that it creates. While we use file naming conventions to address limitations, we should do so only when we must.

Long Names and Cryptography

One of the other factors with stacking fields into a file name is that the names get long. The more information there is – and the longer that information is – the closer we approach file names that Windows can’t handle well. The original APIs for accessing files on Windows have a 260-character limit for the full path for files. While many applications have moved to newer APIs that have a much larger character limit, there are still numerous programs that are limited by 260 characters, thus long names and long folder names can become a problem.

The solution to this is to create standard abbreviations and codes to shorten the name. This creates the additional problem that the file naming convention becomes so complicated that it is difficult to train people to use and becomes fraught with errors related to incorrect use of identifiers, abbreviations, and shortcuts.

Proxy

If we get to the fundamentals, the file naming convention is addressing the limitation of not having metadata support in the underlying system. It’s a proxy for having the information with the perception that, in the future, we’ll have an option to extract the metadata out and place it into appropriate buckets.

The rub comes in when we’re working in systems that are inherently capable of maintaining and managing metadata. In those cases, do we continue to invest primary effort into developing, maintaining, and enforcing file naming conventions, or do we shift our efforts to ensuring that the metadata is set correctly?

Metadata Advantages

File naming conventions are plagued with the problem of readability to the user. Fundamentally, they’re designed to allow a user to parse out important information to determine what they want. Because of that, the question becomes whether you use the friendly information or the record identifier. Said differently, do you use the name of the company or the company identifier? Names are easier to read but are subject to conflicts (e.g., two “Acme, Inc.” companies) and name changes.

Metadata-enabled solutions can conveniently side-step this issue by recording the unique identifier but displaying the name. Instead of having to choose between two difficult alternatives, metadata allows for easy and consistent identification of the record.

Additionally, both file name encoding of metadata and folder-based encoding of metadata is subject to a single navigational path. If you start a name (or path) with a date, you fundamentally enforce this approach on others as they’re trying to find the file. Search technologies are sometimes helpful at finding a file based on part of the name. But because they make no distinction between parts of the name, collisions frequently occur where the Pear Tree Landscaping company’s records are almost unfindable when many of the projects that you’ve completed are “Pear Tree.”

Metadata-based search approaches can be focused on a specific field, thus eliminating both the navigational path problem and the potential of matching based on another field. This is why nearly every content and record management system today offers storage for metadata in individual fields.

What’s the Point?

The reason for file naming conventions was always file findability. The key to findability was identification of the key metadata – which became part of the file naming convention. These key pieces of metadata were organized in the approach that was expected to be the most valuable, recognizing that one way of finding files would necessarily have its limitations. For those still using file system-based approaches, file naming conventions are all we have to improve findability. However, for systems where metadata is available and searchable, it offers a much better way of making files more findable.

That isn’t to say that file naming isn’t valuable, it’s just that it’s not as valuable as getting the metadata into the correct fields.