Agile Buzz Forum - Tagging has simpler problems

Here's an article on tagging problems from September - I meant to comment on it then, but I find myself looking at my flagged posts now that I'm on a train. Oddly enough, that relates to the problem at hand. Here's the scenario that the post lays out as problematic:

Let's say Joe reads a new article about a battery technology breakthrough in the Scientific American. Joe has been thinking about buying a fuel-efficient car lately. When Joe goes to tag the article's web page, he uses the following tags: "battery," "fuel-savings," "car," "future-vehicle." Let's say the article comes with a .gif of a high-level schematic for how the battery works. Joe saves the .gif in his Flikkr account, tagging it with "battery," "schematic," and "fuel-savings."

Eighteen months and many tags later, due to Joe's profession as an engineer at Intel, he has an electric moment and realizes the battery tech breakthrough has more relevance to something he's directly working on, in nano-tech. Given the keywords he chose, will he be able to 1) recall how he tagged the original article, to find it later on or, 2) if he can find it at all, will he be able to easily re-tag the article and the schematic .gif to match the new context in which Joe finds these ideas relevant? I wouldn't bet on either outcome.

That is a problem, and it's one most of us run into a lot. I use del.icio.us to tag posts that I want to be able to find later - I use the tag "cst" for posts that I want to share with people about Cincom Smalltalk. Now, the problem I'm going to run into here isn't the same as the one above - I'm not going to forget the tag. However, over time, I'll tag a whole ton of things that way. Once I have tens (never mind hundreds or thousands) of posts tagged that way, how do I find the needle I actually want in that haystack?

The article suggests that refactoring tools (like Smalltalk's refactoring browser) for tag libraries are the answer. I don't think so. There's a wall of inertia that's going to prevent most people from doing that. Heck, the simpler problem that my title references is that most people won't tag their posts at all. Of the ones that do, a smaller subset will be motivated to refactor.

Don't believe me? Well, let's look at two A-Listers as an example - Scoble and Winer. The former never categorizes a post, and the latter rarely bothers with a title or a category. These two are widely read, and deeply involved in "web 2.0" discussions - and even they can't be bothered to take the minimal amount of action necessary to enable it. How likely do you think it is that the average web user will bother? For your answer, walk into anyone's old video cassette library and see how many of the ones recorded at home actually have a label. The answer will be enlightening.

Here's more evidence: I subscribe to 315 feeds as I write this, and I keep a fairly large cache of old items for each one. Let's trawl through those and see which ones have a category set:

RSSFeedManager default getAllItems size.

That tells me how many items I have sitting in memory. The response? 16,466. Now, let's see how many have no category set:

(RSSFeedManager default getAllItems select: [:each | each category isNil or: [each category = 'None']]) size.

The result there? 10,810. Nearly two thirds of the items I'm tracking have no category associated with them. Now, let's walk back to the web 2.0 discussions where the semantic web heads are trying to decide whether RDF, or OPML, or something else is the best way to make sense of all this. I'll make it simple for them - it just doesn't matter. The problem isn't the one posited in the article - i.e., "how did I categorize that item"? It's "holy smokes, I'm awash in a sea of completely uncategorized plain text!". Before someone chimes in that text search will auto-categorize, I'll point out that engines like Google already do a lot of that - and, as Scoble has been noticing, there are limits to that.


	Web Artima.com