My example for xe was a straightforward port of the test code, but really in xe you would probably not be building your XML structures by hand very often. Usually you would make a class to do it for you. With a class, you can set up sensible default values, check values to make sure they are legal, and so on.
I coded up a couple of classes to demonstrate this. See below. Like my first example, this is tested code; if you download xe you can run this.
One other thing about xe: I'm proud of the way it handles reading in XML data. You create an XML data structure, and you call the .import_xml() method. This then reads in the XML data, and tries to match things up. Where there is a match, it puts the value in your data structure; if there is no match, it will add a member to your data structure and put the data in there. Basically, you just describe the XML data you expect, and if your description is good, it will magically Just Work. This is especially cool because you can describe things like an Atom feed that have a list of 0 or more identical elements, and that will work too!
The .import_xml() method is based on openAnything() by Mark Pilgrim. It accepts a file-like object, a filename, a URL, or a string.
However, namespaces throw a monkey wrench right now. If you are reading in, say, an Atom feed, and you have bound "a" to the Atom namespace, then "a:title" should match with the "title" member in the data structure; right now that doesn't work at all.
Here's the new sample code.
import xe
lst_valid_carriers = ["FDXE", "UPS", "USPS"] class CarrierCode(xe.TextElement): def __init__(self, carr_code): if carr_code is None: carr_code = lst_valid_carriers[0] elif carr_code not in lst_valid_carriers: s = ", ".join(lst_valid_carriers) raise ValueError, "carrier code must be one of: " + s xe.TextElement.__init__(self, "CarrierCode", carr_code)
>With a class, you can set up sensible default values, >check values to make sure they are legal
It seems like a number of the contributors to this thread are at a level of "schema awareness" similar to mine before joining CSIRO a couple of years ago.
With schema-aware processing tools there is no reason for user code to be setting up defaults or checking values. OK, there might be a good reason to have code generating XML that writes defaults rather than relying on the schema's declared defaults, but certainly value type and range-checking should be in the schema.
> With schema-aware processing tools there is no reason for > user code to be setting up defaults or checking values.
Is there a book or web page you recommend for learning more about this?
> OK, there might be a good reason to have code generating > XML that writes defaults rather than relying on the > schema's declared defaults
Example: a user's FedEx class that defaults to the user's account number and other user-specific shipping details.
> but certainly value type and > range-checking should be in the schema.
It might also be faster to have the code "know" the values to check rather than having to parse the schema each time you run your program... especially for trivial programs. That's just a guess, though, and I could be wrong.
There's a lot of stuff about design patterns for XML Schemas on the XMML wiki, which is an international collaboration (XMML working name now renamed GeoSciML partly due to confusion and a squatter on xmml.com).
The O'Reilly "XML Schema" book is pretty good, we have a much-creased paper copy at work and it is online at Safari: http://safari.oreilly.com/0596002521
I am not saying W3C XML Schema is particularly good, but it is sufficiently powerful and usable for rich data descriptions. One of the biggest headaches is that it allows more realistically flexible data descriptions than programming languages are easily able to deal with (that's one of the things I'm hoping to fix with CEDSimply).
> It seems like a number of the contributors to this thread > are at a level of "schema awareness" similar to mine > before joining CSIRO a couple of years ago.
Definitely true for me, regarding schemas and namespaces. I'm basically creating XML only when I have to, and so far I haven't run into namespace or schema issues. Thanks for pointing it out.
py.xml and the tool you created are both interesting solutions to the problem of creating XML in a structured and visually pleasing way.
I have created my own tool, that is, IMHO, very elegant and visually pleasing. I've called it xmlmodel, as I see it as a way of defining the structure of your xml document in terms of an object model.
An example:
#!/usr/bin/env python from xmlmodel import * from datetime import datetime
class rss( XMLModel ): class XMLAttrs: version = '2.0'
class channel( XMLNode ): title = XMLValue('test') description = XMLValue('something') link = XMLValue('http://here') lastBuildDate = XMLDateTime( format = "%a, %d %b %Y %H:%M:%S EST" ) generator = XMLValue() docs = XMLValue()
class item( XMLNodeList ): title = XMLValue() link = XMLValue() description = XMLValue() category = XMLList() pubDate = XMLDateTime( format = "%a, %d %b %Y %H:%M:%S EST" )
(but please, no more "have you heard of XXX project, it's really wonderful", I'm sure it is, but unless its something that is doing it the same way I am, it is not really relevant. If someone out there already has this idea and a more mature codebase, I'd be pleased to drop this, and contribute to that.)
Interesting; a quirky use of classes (only composed of static fields), but I can see the syntax you're aiming at.
The only problem I can see is if there is more than one element with the same tag at the same level, which I think is legal XML (a list of identical items).
Look a little closer, that is happening in my example :-) class item( XMLNodeList ): defines a repeating node, which is a subclass of both XMLNode and the builtin list.
item = feed.channel.item.new() creates a new item in the RSS channel, and retuns a reference for you to manipulate.
you can also append nodes to an XMLNodeList using the list's append, insert, and __setitem__ methods, these do not need to be instances of item, but simply instances of XMLNode.
In fact, what I'm doing is collecting the classes defined within the XMLModel subclass, and instantiating them when an instance of the XMLModel is created, so all those sub classes become composite objects.
So you can create more than one instance, and use them, with out conflict.
> It might also be faster to have the code "know" the values > to check rather than having to parse the schema each time > you run your program... especially for trivial programs.
That sounds a bit like "it might be faster to have the code 'know' the values to check rather than relying on the database schema rules to enforce them." :-)
Schemas are referred to by instance documents, not incorporated. There's no reason why a processing system can't have a schema cached. You could have an architecture where validation against a schema was performed separately before calling user code.
I read somewhere that XML is just reinvented and *terribly* overengineered Lisp. In Lisp code is data and data is code, so mixing them is a normal thing. Languages in XML? (Jelly, anyone?) Why do you think there are so many dialects of Lisp? :)
I'm guessing but I think the reference is to the book "Pragmatic Project Automation" by Mike Clark (ISBN 0-9745140-3-9). It contains (on page 29) a page long explation by James Davidson (the author of Ant) about why he used XML in Ant. The article title is "The Creator of Ant Exorcizes One of His Demons". The full article ends with the following paragraph...
"If I knew then what I know now, I would have tried using a real scripting language, such as JavaScript via the Rhino component or Python via JPython, with bindings to Java objects that implemented the functionality expressed in today s tasks. Then, there would be a first-class way to express logic, and we wouldn't be stuck with XML as a format that is too bulky for the way that people really want to use the tool."
> ... > Another example is Ant. The creator of this tool has > since apologized for using XML > ... > > Do you have a reference to that? I couldn't find such a > public apology on the web.
The creator of Ant writes here about his regrets in (A) using XML and (B) not making Ant more powerful by incorporating enough language constructs. I agree wholeheartedly on both counts, and yet I'm not ready to undertake the project of creating a new build system, much as I would like to have a better one for my own use.