artima - Article Discussion

Article Discussion

An Introduction to XML Data Binding in C++

View Threaded

Summary: XML processing has become a common task that many C++ application developers have to deal with. Using low-level XML access APIs such as DOM and SAX is tedious and error-prone, especially for large XML vocabularies. XML Data Binding is a new alternative which automates much of the task by presenting the information stored in XML as a statically-typed, vocabulary-specific object model. This article introduces XML Data Binding and shows how it can simplify XML processing in C++.

21 posts on 2 pages.

« Previous 1 2 Next »

The ability to add new comments in this discussion is temporarily disabled.

Most recent reply: February 11, 2010 9:43 PM by

Bill

Posts: 409 / Nickname: bv / Registered: January 17, 2002 4:28 PM

An Introduction to XML Data Binding in C++

May 4, 2007 4:08 PM

XML processing has become a common task that many C++ application developers have to deal with. This article introduces XML Data Binding and shows how it can simplify XML processing in C++.

http://www.artima.com/cppsource/xml_data_binding.html

What do you think of the techniques presented in this article? What other approaches have you taken in C++ to process XML?

Bjarne

Posts: 48 / Nickname: bjarne / Registered: October 17, 2003 3:32 AM

Re: An Introduction to XML Data Binding in C++

May 4, 2007 9:42 PM

Looks very elegant. How complicated is the generator? Are there any ways to control the style of C++ generated? (e.g. use of containers, int sizes, string types)

Martin

Posts: 1 / Nickname: kardigen / Registered: October 5, 2006 4:37 AM

Re: An Introduction to XML Data Binding in C++

May 5, 2007 1:29 AM

It's interesting approach. I've worked with DOM like (object oriented text based) approaches and I think it's more universal then XML Data Binding, because the XSD is not needed. However, in some cases presented techniques would be better.

Roland

Posts: 25 / Nickname: rp123 / Registered: January 7, 2006 9:42 PM

Re: An Introduction to XML Data Binding in C++

May 5, 2007 2:24 AM

XML Data Binding stems from the Java world. The most notable examples are JAXB (https://jaxb.dev.java.net/) and XMLBeans (http://xmlbeans.apache.org/). I used JAXB years ago and can recommend it. You often have a given schema or need to write one for XML validation. Code generation from the schema is simple then and allows for very convenient handling of small to medium XML documents.
BTW, a similar approach can be used to generate code from a database schema.
As for the generated C++ code from XSD, IMO, the auto_ptrs are unnecessary. I'd prefer classic RAII where a parent owns (a tree of) children. Unfortunately XSD is a commercial product (with open source GPL teaser) so the motivation to change the generator is limited.

Boris

Posts: 6 / Nickname: boris / Registered: May 10, 2006 8:23 PM

Re: An Introduction to XML Data Binding in C++

May 5, 2007 11:33 AM

How complicated is the generator?

We tried to make the generator as simple as possible but it is still somewhat complex mainly due to various idiosyncrasies of the XML Schema language. We have a custom semantic graph for XML Schema with a convenient traversal mechanism. The output streams perform automatic indentation of the C++ code being produced. This makes the code in the generator quite transparent. The complexity comes from the difficulty of mapping some of the XML Schema constructs to C++. One notable example is anonymous types. At some point we realized that it is often easier to get rid of such constructs by transforming the graph before code generation than to handle things in the generator. As a result we now have a number of transformations such as naming of anonymous types and resolving name conflict that significantly simplify the generator.

Are there any ways to control the style of C++ generated? (e.g. use of containers, int sizes, string types)

There is support for selectively customizing the generated C++ classes, including the mapping of built-in XML Schema types to C++ types (so types like integers, string, etc., can be remapped to custom types). The mechanism is described in the following document:

http://wiki.codesynthesis.com/Tree/Customization_guide

At the moment there is no way to customize the underlying containers but it shouldn't be hard to support.

Boris

Posts: 6 / Nickname: boris / Registered: May 10, 2006 8:23 PM

Re: An Introduction to XML Data Binding in C++

May 5, 2007 11:56 AM

I've worked with DOM like (object oriented text based) approaches and I think it's more universal then XML Data Binding, because the XSD is not needed.

Working on a large XML vocabulary and not having a formal definition for it is suicidal. Handling large and complex vocabularies is also exactly the situation where one experiences the most pain from raw APIs such as DOM and SAX.

Boris

Posts: 6 / Nickname: boris / Registered: May 10, 2006 8:23 PM

Re: An Introduction to XML Data Binding in C++

May 5, 2007 0:11 PM

As for the generated C++ code from XSD, IMO, the auto_ptrs are unnecessary. I'd prefer classic RAII where a parent owns (a tree of) children.

auto_ptr helps you not to write exception-unsafe code, e.g.,

handle_person (person ("p.xml"), can_throw ());

You can also easily strip auto_ptr away with a call to release():

person_t* p = person ("p.xml").release ();

Hector

Posts: 2 / Nickname: hector / Registered: May 6, 2007 4:40 AM

Re: An Introduction to XML Data Binding in C++

May 6, 2007 2:28 PM

> What do you think of the techniques presented in this
> article? What other approaches have you taken in C++ to
> process XML?

A small nit: The title should of maybe indicated, "....using Product XYZ"

Otherwise, the technique is similar to what we do but instead auto-generating a different p-code language used by applications server. The RTE is written in C++. The reasons were basically the same as cited by the article, with speed being a big influence.

On a semi-related note, it might interest you that James Ward (Adobe) has produced an interesting benchmarking demo outlining the different ways today from processing huge data sets (like 5000 records).

http://www.jamesward.org/census

--
HLS

Boris

Posts: 6 / Nickname: boris / Registered: May 10, 2006 8:23 PM

Re: An Introduction to XML Data Binding in C++

May 7, 2007 0:01 AM

A small nit: The title should of maybe indicated, "....using Product XYZ"

The product neutrality issue was considered carefully. The choices were to provide an article without any code examples or to pick a tool and try to show only the basics that are the same or similar across different products. The former choice would have rendered the article pretty much useless so we went with the latter.

http://www.jamesward.org/census

This page doesn't have any content.

Ray

Posts: 2 / Nickname: lisch / Registered: May 7, 2007 3:35 AM

Re: An Introduction to XML Data Binding in C++

May 7, 2007 9:11 AM

> As for the generated C++ code from XSD, IMO, the auto_ptrs
> are unnecessary. I'd prefer classic RAII where a parent
> owns (a tree of) children.

RAII is nice for simple cases, but pointers work better with optional items and substitution groups, just to name two examples.

> Unfortunately XSD is a
> commercial product (with open source GPL teaser) so the
> motivation to change the generator is limited.

Fortunately, XSD is open source (GPL), so we are able to change the generator at will. I've been able to fix bugs without waiting for a vendor, and I've been modifying the code generator to suit our specific needs.

James

Posts: 128 / Nickname: watson / Registered: September 7, 2005 3:37 AM

Re: An Introduction to XML Data Binding in C++

May 7, 2007 10:26 AM

> XML Data Binding stems from the Java world. The most
> notable examples are JAXB (https://jaxb.dev.java.net/) and
> XMLBeans (http://xmlbeans.apache.org/). I used JAXB years
> ago and can recommend it. You often have a given schema or
> need to write one for XML validation. Code generation from
> the schema is simple then and allows for very convenient
> handling of small to medium XML documents.

I do not recommend generating code in a static language from schemata. The fact of the matter is that is doesn't really solve anything and creates a very brittle and non-reusable code.

Basically you take the XML structure and create code that is coupled to it. Now you will generally need to walk the tree and extract the data for use in other places. This means writing a bunch of code bound tightly to those structures. In a nutshell you've just bound all your code to a xml structure. Any change to the schema will require regeneration of the code (if you use validation) even where the changes are irrelevant. If you have multiple versions of the schema or different schemata mapping to the same cannonical data structures, binding will be very difficult at best and infeasible in most cases: more hardcoding results.

A more effective strategy, one that JAXB 2 allows but seems to be rarely used, is to create schemata for your compiled types. This is a lot cleaner because XML allows for the declaration of rich structures that cannot be built in a language like C++ or Java without executable code ('choice' elements, for example) and can easily represent hierarchal field definitions from classes. Once this is done, you map the data from XML documents into these formats using a powerful XML tool such as XPath and convert them into objects. This effective decouples the code from the xml structures. These schemata change only when the classes change. While it might seem like this just moves the same amount of work to XPath, this kind of thing is trivial using stylesheets.

Roland

Posts: 25 / Nickname: rp123 / Registered: January 7, 2006 9:42 PM

Re: An Introduction to XML Data Binding in C++

May 7, 2007 11:52 AM

> RAII is nice for simple cases, but pointers work better
> with optional items and substitution groups, just to name
> two examples.

Pointers and ownership are independent of each other. XML is a hierarchical format, i.e. child nodes have only meaning in reference to (in context of) a parent node (except for the document node). Therefore it's quite 'natural' to let the parent nodes own their child nodes (the parent as 'container' for the child-ren). Of course, parent nodes may give access to their child nodes via pointers or iterators. But the lifetime of the child nodes can, and IMO should, be bound to the lifetime of the parents.
It also amazes me that you (apparently the author of "C++ In a Nutshell") consider RAII only for 'simple cases'. Quite the contrary. Automatic, deterministic resource management (a.k.a. RAII) is the key idiom to reduce complexity in large systems (why else would you still want to use C++ today).

> XSD is open source (GPL), so we are able to
> change the generator at will. I've been able to fix bugs
> without waiting for a vendor, and I've been modifying the
> code generator to suit our specific needs.

Right, but the license terms (http://www.codesynthesis.com/products/xsd/license.xhtml) also make it clear that runtime and generated code may be used freely only for the mentioned FLOSS projects. The authors have of course the right to put their product under any license they deem appropriate.

Roland

Posts: 25 / Nickname: rp123 / Registered: January 7, 2006 9:42 PM

Re: An Introduction to XML Data Binding in C++

May 7, 2007 0:30 PM

> I do not recommend generating code in a static language
> from schemata. The fact of the matter is that is doesn't
> really solve anything and creates a very brittle and
> non-reusable code.

At least the generated code is type safe and therefore less brittle than e.g. DOM code.

> Basically you take the XML structure and create code that
> is coupled to it. Now you will generally need to walk the
> tree and extract the data for use in other places. This
> means writing a bunch of code bound tightly to those
> structures. In a nutshell you've just bound all your code
> to a xml structure. Any change to the schema will require
> regeneration of the code (if you use validation) even
> where the changes are irrelevant.

What you describe as disadvantages can also be seen as advantages. It depends on the application. When you need to create a XML message or store data in structured (XML) format JAXB certainly is a very convenient option (compared to e.g. DOM). OTOH, if you want to recursively traverse the nodes or transform the XML file then DOM or XSLT may be better suited.

> A more effective strategy, one that JAXB 2 allows but
> seems to be rarely used, is to create schemata for your
> compiled types.

This seems to put the cart before the horse. Moreover, the schema is often given because it's standardized.

James

Posts: 128 / Nickname: watson / Registered: September 7, 2005 3:37 AM

Re: An Introduction to XML Data Binding in C++

May 7, 2007 5:47 PM

> > I do not recommend generating code in a static language
> > from schemata. The fact of the matter is that is
> doesn't
> > really solve anything and creates a very brittle and
> > non-reusable code.
>
> At least the generated code is type safe and therefore
> less brittle than e.g. DOM code.

I guess you can define 'brittle code' any way you like but what I mean by brittle is that insignificant changes cause the code to break. DOM based code doesn't break when a new element is added to the schema or if an element's type is changed in a minor way e.g. it's length goes from 5 to 6.

In any event this isn't really the worst thing about JAXB code. The worst thing is that all the work to get the required information into useful places is basically hardcoded. I worked with fairly large JAXB base that was basically unmaintainable. With DOM you can at least write code that can retrieve elements from similar structures. With JAXB, you can have two schemata with the exact same address element structure but you must write the code to retrieve the the elements repeatedly because it creates wholly separate types for them. But I'm not advocating DOM. It's basically a straw man.

> > Basically you take the XML structure and create code
> that
> > is coupled to it. Now you will generally need to walk
> the
> > tree and extract the data for use in other places.
> This
> > means writing a bunch of code bound tightly to those
> > structures. In a nutshell you've just bound all your
> code
> > to a xml structure. Any change to the schema will
> require
> > regeneration of the code (if you use validation) even
> > where the changes are irrelevant.
>
> What you describe as disadvantages can also be seen as
> advantages. It depends on the application. When you need
> to create a XML message or store data in structured (XML)
> format JAXB certainly is a very convenient option
> (compared to e.g. DOM).

Convenient in what way? What could be more convenient than just populating Objects with data and using them? Walking trees with Java is a nightmare. The JAXB code I worked with looked like this:

if (greatgrandparent != null) {
   GrandParent grandparent = greatgrandparent.getChild();
 
   if (grandparent != null) {
       Parent parent = grandparent.getChild();
 
       if (parent != null) {
           Child child = parent.getChild();
       }
   }
}

But many more levels deep and over and over again. The only thing that's convenient about it is it allows you to avoid learning to use a proper XML toolset and do everything with Java.

> OTOH, if you want to recursively
> traverse the nodes or transform the XML file then DOM or
> XSLT may be better suited.

XPath and XSLT are good for these things but it has nothing to do with what I am talking about here.

> > A more effective strategy, one that JAXB 2 allows but
> > seems to be rarely used, is to create schemata for your
> > compiled types.
>
> This seems to put the cart before the horse. Moreover, the
> schema is often given because it's standardized.

You are definitely missing the point. The schema generated for the code is only used to map data into objects. The standardized schema doesn't go away.

I spent 2 years banging my head on JAXB generated classes. We got to the point where we'd go out of our way to avoid changing a schema because of all the work that was required to do it. Any slight modification would require generating new classes, writing a bunch of code and touching all kinds of tangential modules.

With the methodology I am advocating, you add any new elements to the classes that use them, regenerate the schemata and use any number of highly efficient XML tools to map the required data into the Object. You can map many different message formats to the same Objects making your code much more reusable. Generating classes from schemata seems like a good idea on the surface but is a fundamentally flawed approach. It would be workable in a dynamic language like Ruby or Python but in a static language it gets you nowhere. It creates a redundant mirror of the XML structures in code violating DRY among other principles of good design.

To make it clearer what I am talking about. We'd have say 6 different standardized schemata for a purchase order that we had to support with new ones added over time. In order to avoid generating 6 sets of classes and writing thousands of lines of Java to place those orders, we created a canonical schema for a purchase order. Then we took this and generated classes from it. Then we had about 1000 or so lines of code to put that canonical data into stable business Objects. Then we had another 1000 or so lines of code to write the data from the usable java Object back into the JAXB Object. The JAXB classes did nothing for us. In fact the made things harder because walking a tree in JAXB is extremely labor (read: code) intensive. Using the technique I am describing, the data goes from XML straight into the business objects and all the translation is done with the proper tools.

Boris

Posts: 6 / Nickname: boris / Registered: May 10, 2006 8:23 PM

Re: An Introduction to XML Data Binding in C++

May 7, 2007 11:48 PM

I guess you can define 'brittle code' any way you like but what I mean by brittle is that insignificant changes cause the code to break.

On the other hand, in the data binding approach, the client code that breaks as a result of a change will be flagged by the C++ compiler thanks to static typing. In case of DOM or your XPath-based manual mapping approach, with every change to your XML vocabulary you are left wondering (or guessing) whether the change was insignificant or the code is now silently broken.

To make it clearer what I am talking about. We'd have say 6 different standardized schemata for a purchase order that we had to support with new ones added over time. In order to avoid generating 6 sets of classes and writing thousands of lines of Java to place those orders, we created a canonical schema for a purchase order. Then we took this and generated classes from it. Then we had about 1000 or so lines of code to put that canonical data into stable business Objects. Then we had another 1000 or so lines of code to write the data from the usable java Object back into the JAXB Object.

There is a much cleaner way to implement this in XSD (I don't know about JAXB). The idea is to define a base type for all purchase orders in XML Schema. This type can be empty or it can contain some common elements/attributes. Then you define your purchase orders as extensions of this base type. When compiling the schema to C++, you customize the base class by adding virtual functions that will constitute the interface to all the purchase orders. Then you customize the concrete purchase orders by implementing those virtual functions. The application code manipulates all purchase orders via the customized base class. This approach is also a lot more efficient than XPath-based remapping.

21 posts on 2 pages.

« Previous 1 2 Next »