This post originated from an RSS feed registered with Agile Buzz
by James Robertson.
Original Post: Character encoding and RSS
Feed Title: Cincom Smalltalk Blog - Smalltalk with Rants
Feed URL: http://www.cincomsmalltalk.com/rssBlog/rssBlogView.xml
Feed Description: James Robertson comments on Cincom Smalltalk, the Smalltalk development community, and IT trends and issues in general.
BitWorking comments on how hard character encoding is to get right:
Character encoding is hard. Really. If I could point to
one thing that causes feeds to be invalid more than anything
else, it would be character encoding.
This is the primary reason that BottomFeeder does not reject bad feeds - it just ignores bad characters to the largest extent that it can and moves on. The reality is, feeds move in and out of well-formedness on a regular basis. There are so many people posting so much content from so large a set of tools, that it's simply unrealistic to expect ongoing perfection. I periodically get errors in some of the CST feeds - I'm not entirely certain how, because it's nearly always a comment that came from - somewhere. Easy enough to fix when I notice, but I don't always notice. I'm pretty sure other content producers have the same problem - most of them aren't hosting the server themselves, and most of them have minimal control over whatever it is that the server does. It's not that anyone wants to create malformed content - it just happens. In the meantime, content consumers still want to read the content, even if it has a few bad characters in it for a period of time. Stating that client side applications should just reject that content out of hand is simply anti-social, IMHO. Sure, notify the user that there's a problem with the content, and let them contact the provider if they feel like it. In the meantime, you shouldn't punish grandma because of an error she has no control over...