The Artima Developer Community
Sponsored Link

Agile Buzz Forum
Into the noise, some meaning

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
James Robertson

Posts: 29924
Nickname: jarober61
Registered: Jun, 2003

David Buck, Smalltalker at large
Into the noise, some meaning Posted: Jun 23, 2004 7:08 AM
Reply to this message Reply

This post originated from an RSS feed registered with Agile Buzz by James Robertson.
Original Post: Into the noise, some meaning
Feed Title: Cincom Smalltalk Blog - Smalltalk with Rants
Feed URL: http://www.cincomsmalltalk.com/rssBlog/rssBlogView.xml
Feed Description: James Robertson comments on Cincom Smalltalk, the Smalltalk development community, and IT trends and issues in general.
Latest Agile Buzz Posts
Latest Agile Buzz Posts by James Robertson
Latest Posts From Cincom Smalltalk Blog - Smalltalk with Rants

Advertisement

On the Atom mailing list, there's been a lot of talk recently about what should/should not be done with malformed feeds. The answer probably differs based on context:

  • In a b2b context, you probably want to reject malformed XML data. This isn't an appropriate place to make a "best guess" and move along
  • In a consumer context (i.e., the one most news aggregators live in), it's reasonable to flag the bad data (so that a user who cares can report it) and try to present it anyway.

The difference is context - if it's a business level communication, then guessing isn't appropriate. If, on the other hand, I'm trying to find out what the latest baseball scores are, then I don't really care about the stray Unicode character that wandered into a feed.

The truly interesting piece is the stats that Mark Pilgrim dug up:

I analyzed 5096 RSS and Atom feeds chosen at random from Syndic8.com and parsed them with Universal Feed Parser 3.0.1 using the latest version of libxml2 as the underlying XML parser.

Actually, I analyzed more feeds than that, but I threw away feeds that

  • didn't either return an HTTP status code 200 or redirect to a URL that returned 200, or
  • didn't have a recognizable root-level element of some version of RSS or Atom
  • 3929 feeds (77.10%) were well-formed.
  • 961 feeds (18.86%) were not well-formed due to specifying "Content-Type: text/xml" but containing non-us-ascii characters.
  • 206 feeds (4.04%) were not well-formed for other reasons.

Nearly a quarter of the feeds chosen (and likely this holds across all feeds) have issues - and they have issues that a tighter spec is not going to solve. We've crossed the Rubicon on this one, at least in the consumer space....

Read: Into the noise, some meaning

Topic: Windows API and the web Previous Topic   Next Topic Topic: Whoa - cruft

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use