The Artima Developer Community
Sponsored Link

Ruby Buzz Forum
Resurrecting libxml-ruby

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page


Posts: 201
Nickname: cfis
Registered: Mar, 2006

Charlie Savage
Resurrecting libxml-ruby Posted: Jul 16, 2008 11:27 AM
Reply to this message Reply

This post originated from an RSS feed registered with Ruby Buzz by .
Original Post: Resurrecting libxml-ruby
Feed Title: cfis
Feed URL: http://cfis.savagexi.com/articles.rss
Feed Description: Charlie's Blog
Latest Ruby Buzz Posts
Latest Ruby Buzz Posts by
Latest Posts From cfis

Advertisement

There is general discontent with the state of XML processing in Ruby - see for example here or here. An obvious solution is to use libxml. However that has been a non-starter since the libxml Ruby bindings have historically caused numerous segementation faults, don't run on Windows and recently lost their current maintainer, Dan Janowski. Making it even more frustrating is that Dan had spent the last year rearchitecting the bindings, successfully fixing the segmentation faults.

Since MapBuzz heavily depends on libxml, it seemed time to step in and contribute. And so I have. Over the last two weeks I've added support for Windows, cleaned out the bug database and patch list, resolved the few remaining segmentation issues, greatly improved the RDocs and refactored large portions of the code base to conform with modern Ruby extension standards.

After iterating through a couple of releases over the last two weeks, the Ruby libxml community is happy to announce the availability of version 0.8.0, which we believe is ready for prime time. It offers a great combination of speed, functionality and conformance (libxml passes all 1800+ tests in the OASIS XML Tests Suite).

So give it a try - its as easy to install as:

gem install libxml-ruby

If you're on Windows there may be an extra step - you might have to copy the prebuilt libxml2.dll library into the libxml-ruby lib directory, your Ruby directory or Windows path (basically put it someplace where Windows can load it).

Undoubtedly there are still some bugs left, so please report anything you find, so we can fix them in future releases.

Blindingly Fast

The major reason people consider using libxml-ruby is performance. Here are the results of a few simple benchmarks that have recently blogged about on the Web (you can find them in the benchmark directory of the libxml distribution).

From Zach Chandler:

              user     system      total        real
libxml    0.032000   0.000000   0.032000 (  0.031000)
Hpricot   0.640000   0.031000   0.671000 (  0.890000)
REXML     1.813000   0.047000   1.860000 (  2.031000)
From Stephen Bannasch:
              user     system      total        real
libxml    0.641000   0.031000   0.672000 (  0.672000)
hpricot   5.359000   0.062000   5.421000 (  5.516000)
rexml    22.859000   0.047000  22.906000 ( 23.203000)

From Andreas Meingast:

LIBXML THROUGHPUT:
	10.2570516817665 MB/s
	10.2570830340359 MB/s
	12.6992253283934 MB/s
  10.2570516817665 MB/s
	8.51116888387252 MB/s
	10.2570830340359 MB/s

HPRICOT THROUGHPUT:
	0.211597647822036 MB/s
	0.202390771964726 MB/s
	0.180272812529665 MB/s
	0.198474511420818 MB/s
	0.198474499681793 MB/s
  0.180925089981179 MB/s

REXML THROUGHPUT:
	0.130301425548982 MB/s
	0.131630590068325 MB/s
	0.128316078417727 MB/s
	0.125203555921636 MB/s
	0.120181872867636 MB/s
	0.115330940074107 MB/s

I can't vouch for the appropriateness of the tests, but they show libxml clocking in at 10x hpricot and 30x to 60x REXML. I'd be happy to accept additional tests or more appropriate tests if you have any.

An Embarrassment of Riches

In addition to performance, the libxml-ruby bindings provide impressive coverage of libxml's functionality. Goodies include:

  • SAX
  • DOM
  • XMLReader (streaming interface)
  • XPath
  • XPointer
  • XML Schema
  • DTDs
  • XSLT (split into the libxslt-ruby bindings)

Now, your first reaction might be that SAX, DOM and XPath are all you need, but validating parsers make it a whole lot easier to sanitize user contributed content on web sites. And the XMLReader offers a clever way of combining the DOM's ease of use (well, ok, compared to SAX at least) with SAX's memory and speed advantages.

Better yet, most of this functionality is exposed via an easy-to-use, Ruby like API. There are still of course some warts lurking in the code, where libxml's C api leaks through to Ruby, but they are being removed one by one. And for those of you who aren't C hackers, much of this work can be done in good old Ruby.

A Long History

For such a useful, and full-featured library, the libxml-ruby bindings have a star-crossed history. Out of curiosity, I went back and traced their lineage. Sean Chittenden originally wrote them back in 2002. At the start of 2005, Trans Onoma adopted the project after Sean had moved on, and at the end of 2005 the bindings found their current home on Ruby Forge. At that point Ross Bamford took over maintenance and worked on the bindings for roughly a year, until early 2007, when then the bindings again became unmaintained. Dan Janowski picked up the ball in 2007 and completely overhauled the binding's memory model. Sadly, Dan had to give up active support this spring.

But on the bright side, Trans, Dan and Sean are all once active on the mailing list, providing valuable experience and insight. From my point of view, with the renewed push towards a production quality release, and bringing in new users, the libxml-ruby community is as healthy as it has been in a long while.

Read: Resurrecting libxml-ruby

Topic: Ruby: Move a Method From a Class to a Module Definition Previous Topic   Next Topic Topic: Force indices

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use