This post originated from an RSS feed registered with Web Buzz
by Simon Willison.
Original Post: Living on a knife edge
Feed Title: Simon Willison: [X]HTML and CSS
Feed URL: http://simon.incutio.com/syndicate/markup/rss1.0
Feed Description: Simon Willison's [X]HTML and CSS cateory
In The XHTML 100, Evan Goer describes an experiment in which he checked 119 site claiming to be with an XHTML doctype for full compliance with the W3C standards. His test consisted of three parts - a validation check on the front page, a check on another "inside" page, and a check to see if the correct Content-Type header (application/xhtml+xml) was served to supporting User Agents (in his case Mozilla 1.3).
The results are depressing, but not necessarily surprising. Only one site passed all three tests - beandizzy. Of the others, most fell at the first hurdle with only 13 getting as far as the third test.
I don't know if my site was included in the experiment, but if it was it failed at the third test as well. I have now implemented Mark Pilgrim's trivial PHP fix (which serves the correct Content-Type to user agents that include application/xhtml+xml in their HTTP-ACCEPT header). This is no small step to take - serving XHTML with the correct Content-Type causes Gecko based browsers to attempt to parse it using a real XML parser, and should it turn out to be well formed they will refuse to render the site and die with an error message. Since I use Phoenix myself and almost certainly visit this site more than anyone else I'm hoping I'll spot and fix any errors before anyone else runs in to them. Talk about living on a knife edge!
I've been cautious about recommending XHTML for several months now, and this turn of events has made me even more wary of it as a technology that is ready for mainstream use. Creating valid XHTML documents is extremely difficult - virtually impossible by hand without regular checks with the validator, and hard to achieve using home grown tools as well. I plan to revise my Validator Web Service code shortly to help run automated validation checks whenver I update, but it's going to take quite a lot of effort to keep things working as they should.
So why bother when HTML 4.01 Strict gives all of the benefits of structural, valid markup with none of the additional hassles provided by XHTML? 6 months ago I would have said that XHTML is vital to support new light weight devices that can only handle an XML parser, but with mobile phones carrying full tag-soup capable web browsers that's looking more and more unlikely. The greatest benefit provided by valid XHTML is the increased ability to automate the extraction and processing of content at a later date (see Mark Pilgrim's acclaimed acronym and citation support for a concrete demonstration of this idea). I've been storing my blog entries as XHTML since I started blogging, and I maintain a firm belief that XHTML is an excellent format for storing items of content. Sadly, it just doesn't seem practical or worthwhile to serve it to browsers.
I'm going to keep serving this blog as XHTML as an open experiment in the practicalities and challenges involved in doing so, but from now on my other web projects will target HTML 4.01 Strict.