After I recorded today's screencast, I gave the problem some thought - not so much the parsing of XML that has problems, but the bigger issue: using libraries.
Back when I was first writing BottomFeeder, there was a ton of bad XML floating around pretending to be good RSS . That problem has largely subsided, as most feeds are created by a small set of relatively standard tools now. However, at the time, I had a simple problem: users of BottomFeeder wanted to see news updates, and the presence of an illegal character in a feed wasn't really their problem. The more pedantic developers would say that any such feed should be rejected, as some sort of lesson to the feed provider... as if that was useful. Others assumed that any error handling I was doing must be the result of some awful, regex driven, tag soup parser I was creating. The latter objection is the interesting one, I think.
In many libraries you get for languages like Java and C# (the "enterprise class" ones), lots of things are declared "final" - the library designer has decided to play god, and declared that thou shalt not subclass - his decisions being perfect. That fact is what drove many people to assume that I had created a tag soup parser; in their libraries, they couldn't subclass the parser, so it was simply unthinkable :)
This is one of the key advantages of Smalltalk, I think. Not only is it a dynamic language, which gives you tons of flexibility - it's not a closed off system. The utterly absurd notion of a "final" class never came into Smalltalk, so no library designer can trap you in a maze of pre-determined answers that don't really fit your problem. So - when I had a problem with XML back in 2005, I just subclassed the parser, added some cases to "skip over bad characters", and moved on. I didn't change the base parsing behavior; anyone who needed those errors would still get them. I just added an option.
That's really what dynamic languages in general, and Smalltalk in particular, are all about: adding options. Languages like Java and C# are about subtracting options, pushing developers into little boxes filled with lines labeled "thou shalt not".
Technorati Tags:
flexibility, dynamic, error handling