This post originated from an RSS feed registered with Ruby Buzz
by .
Original Post: Resurrecting libxml-ruby
Feed Title: cfis
Feed URL: http://cfis.savagexi.com/articles.rss
Feed Description: Charlie's Blog
Latest Ruby Buzz Posts
Latest Ruby Buzz Posts by
Latest Posts From cfis
Advertisement
There is general discontent with the state of XML processing in Ruby - see for example here or here. An obvious solution is to use libxml. However that has been a non-starter since the libxml Ruby bindings have historically caused numerous segementation faults, don't run on Windows and recently lost their current maintainer, Dan Janowski. Making it even more frustrating is that Dan had spent the last year rearchitecting the bindings, successfully fixing the segmentation faults.
Since MapBuzz heavily depends on libxml, it seemed time to step in and contribute. And so I have. Over the last two weeks I've added support for Windows, cleaned out the bug database and patch list, resolved the few remaining segmentation issues, greatly improved the RDocs and refactored large portions of the code base to conform with modern Ruby extension standards.
After iterating through a couple of releases over the last two weeks, the Ruby libxml community is happy to announce the availability of version 0.8.0, which we believe is ready for prime time. It offers a great combination of speed, functionality and conformance (libxml passes all 1800+ tests in the OASIS XML Tests Suite).
So give it a try - its as easy to install as:
gem install libxml-ruby
If you're on Windows there may be an extra step - you might have to copy the prebuilt libxml2.dll library into the libxml-ruby lib directory, your Ruby directory or Windows path (basically put it someplace where Windows can load it).
Undoubtedly there are still some bugs left, so please report anything you find, so we can fix them in future releases.
Blindingly Fast
The major reason people consider using libxml-ruby is performance. Here are the results of a few simple benchmarks that have recently blogged about on the Web (you can find them in the benchmark directory of the libxml distribution).
I can't vouch for the appropriateness of the tests, but they show libxml clocking in at 10x hpricot and 30x to 60x REXML. I'd be happy to accept additional tests or more appropriate tests if you have any.
An Embarrassment of Riches
In addition to performance, the libxml-ruby bindings provide impressive coverage of libxml's functionality. Goodies include:
Now, your first reaction might be that SAX, DOM and XPath are all you need, but validating parsers make it a whole lot easier to sanitize user contributed content on web sites. And the XMLReader offers a clever way of combining the DOM's ease of use (well, ok, compared to SAX at least) with SAX's memory and speed advantages.
Better yet, most of this functionality is exposed via an easy-to-use, Ruby like API. There are still of course some warts lurking in the code, where libxml's C api leaks through to Ruby, but they are being removed one by one. And for those of you who aren't C hackers, much of this work can be done in good old Ruby.
A Long History
For such a useful, and full-featured library, the libxml-ruby bindings have a star-crossed history. Out of curiosity, I went back and traced their lineage. Sean Chittenden originally wrote them back in 2002. At the start of 2005, Trans Onoma adopted the project after Sean had moved on, and at the end of 2005 the bindings found their current home on Ruby Forge. At that point Ross Bamford took over maintenance and worked on the bindings for roughly a year, until early 2007, when then the bindings again became unmaintained. Dan Janowski picked up the ball in 2007 and completely overhauled the binding's memory model. Sadly, Dan had to give up active support this spring.
But on the bright side, Trans, Dan and Sean are all once active on the mailing list, providing valuable experience and insight. From my point of view, with the renewed push towards a production quality release, and bringing in new users, the libxml-ruby community is as healthy as it has been in a long while.