So this is pretty crazy. I'm messing around with ElementTree
(which has been nothing less than perfect) and trying to get it to act
like a xml.dom.pulldom
/XmlTextReader
style pull-parser. But I'd like
to be able to assemble a chain of generator producing/consuming
functions (or other callable) so that the file can be read, parsed,
filtered/mutated, encoded, and written all incrementally.
Check it out:
import sys
import pulltree # that's what I'm working on :)
def upper_filter(source):
for (ev, item) in source:
if ev == pulltree.CHARACTERS:
item = item.upper()
yield (ev, item)
reader = pulltree.reader(sys.stdin)
filter = upper_filter(reader)
writer = pulltree.writer(filter, sys.stdout)
for (ev, item) in writer:
pass
C-z
$ echo "<hello>world</hello>" | python test_filter.py
<hello>WORLD</hello>
That felt good. More functional than a chain of SAX XMLFilter
s,
almost as efficient, and muuuuch perdier.
Something like this might work someday soon:
import urllib2
from pulltree
XINCLUDE = '{http://www.w3.org/2001/XInclude}include'
def xinclude_filter(source):
events = iter(source)
for (event, item) in events:
if event == pulltree.START_ELEMENT and elm.tag == XINCLUDE:
href = item.attrib['href']
for woot in pulltree.reader(urllib2.urlopen(href))
yield woot
pulltree.eat(elm, events) # eat events to the end of the element
yield (ev, elm)
Granted, that's as basic an XInclude processor could be and still
be useful but you get the point.