Jon Udell notes that we have access to tons of data on the web - but interestingly enough, it's not easily accessible for automated reuse:
If you search the Web for “fortune500.xml, you’ll find an ordered list of the Fortune 500 companies. It’s just what you’d want if you were writing a custom portfolio application. But it didn’t exist until last week when Doug Purdy, a Microsoft program manager, created it while writing his own personal portfolio application. Because he also blogged the list, you can use it, too.
Jon points out that data is mostly presented for passive viewing, not for further analysis. For instance - what if you looked at the typical Fortune 500 list (HTML Table), and wanted to slice and dice the data in a way that the authors didn't? Hello, massive data entry task. It doesn't have to be that way, and there are even tools around that show what should be more easily possible:
For an example of what things could and should be like, check out episode 10 of The Screening Room. At the six-minute mark in that screencast about Dabble DB, a Web database, Smallthought Systems?Avi Bryant -- who is analyzing a set of data about investments -- wants to look at investments by U.S. state as a function of population. The current data set includes states but not their populations. To add population data, Avi visits a Web site that lists states and populations, activates a JavaScript bookmarklet, and imports two columns from the HTML table on that Web page.
That's the kind of analysis that would be more easily possible if data were made available in machine friendly formats as well as in people friendly ones. The Semantic web hasn't arrived yet...
Technorati Tags:
semantic web