The Artima Developer Community
Sponsored Link

Python Buzz Forum
ElementTree on the come-up

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Ryan Tomayko

Posts: 408
Nickname: rtomayko
Registered: Nov, 2004

Ryan Tomayko is an instance of a human being...
ElementTree on the come-up Posted: Jan 12, 2005 10:19 AM
Reply to this message Reply

This post originated from an RSS feed registered with Python Buzz by Ryan Tomayko.
Original Post: ElementTree on the come-up
Feed Title: Ryan Tomayko (weblog/python)
Feed URL: http://tomayko.com/feed/
Feed Description: Entries classified under Python.
Latest Python Buzz Posts
Latest Python Buzz Posts by Ryan Tomayko
Latest Posts From Ryan Tomayko (weblog/python)

Advertisement

I had a very small number of complaints related to basing Kid on ElementTree. This came in two forms:

  1. SAX and DOM are “standard” and while ElementTree is a drastically improved system for processing XML in Python, it doesn't matter because everyone already knows SAX/DOM.

  2. “libxml2 is teh rawk!”

First, if Python's W3C DOM standard based xml.dom package were a movie, it would be called Elf, staring xml.dom. It's the episode of Little House on the Prairie where Alien asks Michael Landon for permission to marry his daughter. It does not belong here!

Next, in terms of pythonicness, libxml2 is almost worse than xml.dom but you at least get something for it: they don't even have a word to describe this level of “fast” and it comes along with XPath, RelaxNG, XSD, XML-Base, XInclude, and XSLT. My issue with libxml2 is just that it's a bad dependency for a project like Kid that wants to be able to run on cheap web space with bare-bones Python support. There are a lot of hosting providers that aren't going to have libxml2 or the option of compiling from source.

I went with ElementTree because it's simple, pythonic, and fast enough. I also had a feeling we'd be seeing more development around ElementTree, which brings us nicely to why I'm posting.

Fredrik Lundh announced cElementTree, an implementation of his ElementTree XML parsing library for Python implemented in C. The initial numbers coming out of effland look excellent:

library time space
xml.dom.minidom (Python 2.1) 6.3 s 80000k
xml.dom.minidom (Python 2.4) 1.4 s 53000k
ElementTree 1.2 1.6 s 14500k
ElementTree 1.2.1/1.3 1.1 s 14500k
PyRXPU (C extension) 0.22 s 11500k
cElementTree 0.8 (C extension) 0.058 s 5700k
readlines (read as text) 0.032 s 5050k

This comes on the heels of a well hidden announcement by Martijn Faassen on the lxml mailing list Saturday:

The lxml.etree implementation of ElementTree, on top of libxml2, is getting there now. It features automatic memory management and quite a bit of ElementTree compatibility. Not all of the ElementTree API has been implemented yet, but enough for many use cases.

As everyone is already quite aware, libxml2 is fast. But as I mentioned, the python bindings that ship with libxml2 are painful; many a hacker has been seduced by its performance only to be bitten later by monsters growing out of the large impedance mismatch it creates with the rest of your python code.

This is all really great news, of course, but now there's questions to be asked and work to be done:

  • Will Fredrik and others collaborate to create a compatibility definition for these different ElementTree implementations? I'd like to see a definition of a mandatory ElementTree API. Ideally, whether to use cElementTree, lxml.etree, or ElementTree proper would be a decision based on what was available in a given environment, not a decision made when coding.

  • I'd like to see libxml2 added to Fredrik's comparison table. (Fredrik: ping)

  • At some point in the future (Python 3000?), I'd like to see ElementTree or its equivalent rolled into the core library. This seems unlikely though, as I don't think XML-SIG or the greater python community wants the Python/XML waters any murkier. I partially agree but the number of people looking outside of core python's XML support for functionality it provides says that it isn't getting the job done.

Read: ElementTree on the come-up

Topic: Wax Sourceforge project Previous Topic   Next Topic Topic: Sublunary Paths

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use