This post originated from an RSS feed registered with Agile Buzz
by James Robertson.
Original Post: Bad RSS and the pain it causes
Feed Title: Cincom Smalltalk Blog - Smalltalk with Rants
Feed URL: http://www.cincomsmalltalk.com/rssBlog/rssBlogView.xml
Feed Description: James Robertson comments on Cincom Smalltalk, the Smalltalk development community, and IT trends and issues in general.
Dare Obsanjo points out how some feeds follow the RSS specs (in so far as you can call them specs, but never mind) exactly - but end up creating a nightmare for aggregators and the users who use them. He points to a specific feed, and tells us exactly what's wrong with it - specifically, the issue of using the same link for multiple items (without a GUID being present):
Now how does the RSS aggregator tell whether the item with the title "I am item 1" is the same as the one named "I am item one" with a typo in the title fixed or a different one?%A0 The simple answer is that it can't. A naive hack is to look at the content of the element to see if it is the same but what happens when a typo was fixed or some update%A0to the content of the ?
Every RSS aggregator has some sort of hack to deal with this problem. I describe them as hacks because there is no way that an aggregator can 100% accurately determine when items with the same link and no guid are the same item with content changed or different items. This means the behavior of different aggregators with feeds such as the Cafe con Leche RSS feed is extremely inconsistent.
This is the sort of thing that drives us aggregator authors nuts. Presumably, authors want their content to be read. Is there a reason that some of them have to make it so blasted hard?