The Artima Developer Community
Sponsored Link

Agile Buzz Forum
Liberal XML parsing?

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
James Robertson

Posts: 29924
Nickname: jarober61
Registered: Jun, 2003

David Buck, Smalltalker at large
Liberal XML parsing? Posted: Jul 17, 2005 7:07 PM
Reply to this message Reply

This post originated from an RSS feed registered with Agile Buzz by James Robertson.
Original Post: Liberal XML parsing?
Feed Title: Michael Lucas-Smith
Feed URL: http://www.michaellucassmith.com/site.atom
Feed Description: Smalltalk and my misinterpretations of life
Latest Agile Buzz Posts
Latest Agile Buzz Posts by James Robertson
Latest Posts From Michael Lucas-Smith

Advertisement

The discussions on whether you should be allowed to parse non-conforment XML has been raging for a few weeks now. I generally like to keep out of this war. But I think I should put my 2c in because of my involvement in the XML technology.

First of all, I do not agree with the advocates of parsing XML data that's non-conforment simply because they can. They should do respect the goals of XML. But - if your business requirement is to interact with the web at-large and you need fault tolerence with that, then handling non-comforment XML is something you should do.

In WithStyle, we prefer real valid XML - in fact, we penalise the CPU by first trying to parse a document as XML, then failing that, we use LibTidy to try and convert it in to valid XML then parse it, failing that, we then use LibTidy to try and convert it in to HTML, then in to XHTML, then parse it as a valid XML document. Failing that, we have one last resort that's out of date - but if we get here, you can bet your bottom dollar we're not going to parse it.

Now I guess I should point out that this guy has decided to stop reading James's blog because James is an advocate of parsing dodgy XML. In actual fact, James uses the same set of rules that WithStyle uses - he doesn't really pay much attention to this because he's too busy with his business requirement of handling dodgy data. But in actual fact, his code prefers valid XML.

Okay, now on to the XML specification. The spec is quite specific about saying that normal XML processing should cease if a fatal error is encountered. But, it does not stop you trying to do other kinds of processing. To that end, I submit that the approach WithStyle takes is valid:

  • Can't parse it as XML, try a non-normal parsing approach to turn it in to valid XML
  • Can't parse it as XML, try a non-normal parsing approach to turn it in to HTML, then XHTML
  • Still can't parse it as XML, give up

Note that we do not try to correct errors as we parse the XML. We give up, like the spec says we should. Instead, we then try and adjust the data until it becomes something that is valid XML.

So at the end of the day, WithStyle will only parse valid XML - but it has a few tricks up its sleeve to convert dodgy data in to XML.

So, while James may like to get on his high horse about dealing with invalid XML, I try to avoid that argument entirely. I know that my code underneath isn't breaking the rule - and neither is James's. So, perhaps that guy Sjoerd Visscher might like to subscribe to James's blog again.. who knows. Hopefully he's subscribed to mine.

Read: Liberal XML parsing?

Topic: Italics blown after OS X upgrade? Previous Topic   Next Topic Topic: Product marketing and the web

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use