This post originated from an RSS feed registered with Agile Buzz
by James Robertson.
Original Post: Sjoerd Visscher hasn't met any users
Feed Title: Cincom Smalltalk Blog - Smalltalk with Rants
Feed URL: http://www.cincomsmalltalk.com/rssBlog/rssBlogView.xml
Feed Description: James Robertson comments on Cincom Smalltalk, the Smalltalk development community, and IT trends and issues in general.
RSS and Atom are both XML file formats, they do not accidentally look like XML. Thus according to the XML specification, aggregators are not allowed to try to show broken feeds. If you are doing liberal XML parsing, you are not playing by the rules.
A lot of people are parsing feeds, or are planning to do so. Most of them do so because they want to do something interesting with the data, it might be an aggregator, but it could also be some cool new application. What they certainly are not interested in is the technology of parsing itself. They simply want to use one of the abundantly available XML parsers. Now there are two ways to do feed parsing. One is to only allow proper XML and patiently educate feed producers who do not use the proper XML tools how to improve. (And almost all feed producers are willing to produce valid XML, but they are not helped enough to actually do that.) The other way is to liberally parse anything that vaguely resembles XML and spoil the fun of using feeds for everybody else. If you are doing liberal XML parsing, you are being inconsiderate.
Heh. He's even gone so far as to say he's stopped reading my blog over this. Here's the problem with his theory - it's wrong. And no, I don't really care what the XML spec says. The spec was written before XML went out into the wild. It's out there now, being used by actual people. You know what happens under those circumstances? Errors happen. You know what happens when an end user can't read a feed with an aggregator? Here's a hint - they don't contact the author of the feed, they contact the author of the aggregator. Why is that? Because - from their perspective - it's a bug. You can point to the spec all you want, and it just doesn't matter. Don't believe me? Go write an aggregator, get yourself some users, and then see what happens.
The issue is even simpler - XML is a textual spec. What that means is - in most circumstances - recovery from an error condition is easy, and getting the rest of the document trivial. That's not the case with binary specs - if the people who fuss about XML being a pure spec are really serious, let them go create a binary spec that doesn't allow for simple error recovery. They can also tell me all about the zero end use they'll get
The funny thing is, I actually use a parser that blows chunks on bad XML. I use Smalltalk though, so I've subclassed the system parser, and overridden the error handling as necessary. I have it flag an error and move along. In BottomFeeder, you'll see that as a black bar on the feed icon. In that manner, if the end user is so motivated, he or she can look at the collection of errors, find the contact info (assuming the feed had it) for the author and report it. Contrast that with Sjoerd's preferred world - the application would report an error and stop processing the feed. No more information, no contact info for the feed author, nada. So here's the punchline - he thinks liberal parsing is inconsiderate. Of course, in his world, we simply get inscrutable errors that can't be reported to anyone - but that's somehow considerate. And he says I rant too much :)