Searching Instead Of Browsing: Organizing Information Using Labels as Meta-Data

Being able to assign labels to content to organize information for searching is superior to placing content in folders for manual browsing. The folder concept may be suitable to physical documents on paper, but does not lend itself well to digital information. The labels concept combined with an effective search capability is a faster way to organize content and find information.

Organizing content is a means to the end goal of finding information. Since organizing content is not a goal by itself, it should be as simple and less work as possible required to meet the goal of finding information.

The folder concept has many limitations:

A particular item of content can only belong to one folder. Placing it in two folders requires either:
- Making duplicates. This is problematic to maintain.
- Using links. This is problematic too: With ‘soft links’ the content resides in only one folder and if that folder is deleted, the content is deleted too. With ‘hard links’, it is hard to know how many ‘folders’ contain this content and unlinking the last one may unintentionally erase it.
Similarly, folders can only be contained within one folder.
To organize content well in folders requires deep levels of sub-folders. These can be a challenge to browse.
All content must be placed in a folder for it to be well organized in this scheme. Doing this manually is a burden. Setting up rules for some of the content to be automatically placed in folders relieves the burden to a certain extent. However, after a rule has run and placed a content item in a folder, if the rule was found to have been flawed and it mixed the content in with other content in the wrong folder, it can be a bigger burden to find the content and place it in the right folder.
Folders are static. Search results are dynamic. With computing power available to the common person growing, dynamic search makes better sense than static folders which put some of the work on the user rather than the computer.

It should not be mandatory to apply all appropriate labels to all content. If the automated content categorization being used employs techniques like artificial intelligence and pattern recognition and can determine that this article is about personal information management or content management then that particular label should not be mandatory.

As the number of labels grows, the labels should not be organized in a taxonomy tree with a folders/sub-folders structure. Such a tree structure has the problems of folders associated with it. The labels should be associated with each other in complex relationships as ‘concepts’ in a language.

For example, placing the label “computing” should return the content in search results for “technology”. Placing the label “personal information management” should find it in the search results for the concept “email”. Note that in a traditional taxonomy tree, “computing” could be a child of “technology”, but “personal information management” could be a parent of “email”.

However, since web page URLs as they are commonly used, especially on static-html sites, are based on the concept of folders, this is a challenge. Now URLs don’t have to be folder-like in their appearance. For example, all the news articles on a site could have URLs like “phillynews.com/ra23px4” instead of something like “phillynews.com/sports/ice_hockey/flyers/04-08-27-victory.htm” or “phillynews.com/inquirer/2004/08/27/sports/flyers-victory.htm”. In this fictitious example, “ra23px4” is an automatically generated, short and easy to type id pointing to the article like the shortcuts generated by services like tinyurl.com and metamark.net.

Let us consider the organization of email. It seems to be headed in this direction. Some examples in the email space are Google’s GMail, Microsoft’s LookOut Search Plugin for Outlook, Nelson Email Organizer (NEO).

Some possible labels for this document: “personal information management”, “content management”, “computing”, “technology”.