Summary
XQuery 1.0 became an official W3C Proposed Recommendation, a step toward it emerging as the de facto standard XML query language. Artima spoke with XQuery co-creator Jonathan Robie about how XQuery can simplify application design, why XQuery may sometimes render an object model unnecessary, and how XQuery fits into a RESTful world view.
Advertisement
Today DataDirect launched new Web site dedicated to XQuery, the nascent XML query standard. Almost five years in the making, the XQuery 1.0 specifications were recently elevated to a W3C Proposed Recommendation status, paving the way for XQuery to become the de facto standard for querying XML data sources.
Jonathan Robie has been a guiding figure behind XML query standardization since the inception of the project. As XML product manager for DataDirect, he also has first-hand experience of how developers have already been using XQuery to simplify data integration and presentation. Artima spoke with Robie about how and when XQuery can simplify enterprise architecture, and where XQuery fits into current enterprise technologies.
Frank Sommers: When would a developer want to use XQuery?
Jonathan Robie: Here's when you'd want to use XQuery: Your input is either XML, or you have input from a variety of sources that your middleware can represent as XML. Your output is also XML. And suppose you just want to translate the XML input into some [XML] output, and do some processing on that data in-between.
A large category of applications fall in that category: Web services, creating data for Web sites [in XHTML], or publishing applications. In those applications, what you're doing is reporting on data and integrating data.
In those situations, whatever processing you're doing, XQuery is going to be better at XML processing than other languages would be. It's designed for that. If you have objects, use an object-oriented language. If you have relational data, use SQL. If you have XML, use XQuery.
Suppose you have an invoice that you can represent in XML. XQuery can just look at that [invoice] and compute averages or sums, and you can compare that [data] to other pieces of data in another document or in a database. In Java, you have to take that invoice, parse it, cast it into something that Java understands, and do the same thing for data in other data sources.
Every single data source is going to have its own API, its own data model, and might have its own query language as well. You can learn those different APIs and write a bunch of code that will be different for each data source. Or you can use XQuery, and that will present a uniform interface for all those data sources. You are going to have a lot less code, and that [code] is going to be the same for all the data sources your implementation supports. It's just a lot easier.
Frank Sommers: Are you saying that I wouldn't need to convert the XML data to a Java object model, and then the Java objects back into XML, such as XHTML for presentation on the Web?
Jonathan Robie: You may well need an object model if you're doing a bunch of business logic, or if you're presenting that data in some graphic environment that needs those objects. But if what you're doing is taking data from a variety of sources, perform some processing or transformation on that data, an then present that data in another XML document, an object model may not be doing all that much for you. You've just designed an extra layer that you don't need for what you're doing. If you try to use a hammer for a screwdriver, you might be able to get the screw in, but that's not the most straightforward way to do it.
Another thing to consider is performance. If you find yourself creating bits of SQL, plus Java, and DOM, especially if you're mixing in a bit of XSLT, that can't be optimized the way XQuery could optimize that data access and manipulation. You're more likely to use Java to establish your server environment, establish your authorized users, those sorts of things. And you're more likely to use XQuery for the actual XML part of things.
Frank Sommers: XQuery, like SQL, is a declarative language. How would I go about performing procedural steps on the data, such as business logic or input validation?
Jonathan Robie: There are three ways people go about when they want to integrate XQuery with procedural code. There is the XQuery for Java API (XQJ) [JSR 225], which is the JDBC for XQuery. That's now out there in mainstream at this point. Then there is something called the Scripting Extensions for XQuery, which is currently being explored. That's based on an earlier language called XQueryP. XQueryP adds statements to XQuery: You can do one thing, and then do another. It has the ability to do loops, things like that. The third way people are going is the Microsoft LINQ stuff. There are also some Java equivalents. What they're trying to do is simply extend every programming language with a subset of XQuery. In theory, this could work with all kinds of data sources, but right now it's limited to the SQL Server database, or C# and VB.
Of course, if your business logic is complex, that would be a good reason to introduce an object model. But even then, you might find that working with data in the form of XQueries is very effective, even in the context of procedural languages.
Regarding data validation, suppose you have some input that requires that you validate a customer ID. And the requirement is that the customer ID must be unique across some application domain. If you need to access data sources, such as a database, to verify that the ID is unique, then that validation will be a lot easier to do in XQuery.
Even if you have a sequence of validation steps that you need to perform, you may not always want to stick all that data into objects. Instead, you might use your Java program simply to get at your data, integrate it with XQuery, and validate the input by executing XQueries against it.
Frank Sommers: Developers are increasingly looking at REST as a simpler way to present enterprise data. What role do you see XQuery play in a RESTful web application?
Jonathan Robie: Imagine a system with a bunch of clients—REST clients, Web service clients, dynamic HTML clients, as in a Web tier, and publishing applications. Then you have a bunch of data sources down at the bottom. If you start writing a program that uses one client and one data source, and then another client and other data source, and draw a Cartesian cross-path between your clients and your data sources, your architecture soon starts to look like a spider's Web.
To simplify that, you can do two things. On the client end, you can say that every client gets at the data in the same way: every client simply uses HTTP and gets an XML document. Now, imagine that your REST interface speaks to a servlet, and that servlet uses XQuery to access all the data sources in the same way. You have basically created a hub, and that hub uses one interface to the clients, REST, and one interface to all the data sources, XQuery.
Frank Sommers: Why not expose the XQuery interface to the clients?
Jonathan Robie: Because clients are typically not that trusted. You don't want clients to be invoking and executing XQueries directly, because you don't trust them enough to give them database connections, for example.
The servlet would typically operate in the middle tier. And in this example you can write your data services in XQuery. Your data access layer is going to use these XQueries whenever you want to get at data, but the client knows just the name of the query, and the parameters for the query. Those are specified in the URL. That's all your client ever finds out about the query. The query lives in a secure place.
Frank Sommers: One reason developers transform data into an object model is to be able to use all the tools and APIs of a programming language to interact with that data. Most developers certainly don't like to work directly with XML. What sort of tools are available to work with XQuery that allows a developer to stay within his familiar coding environment and language?
Jonathan Robie: There are many development tools for working with XQuery. Stylus Studio is one that we [DataDirect] do. All the big-name, general-purpose IDEs for XML support XQuery—Stylus Studio, Altova, OxyGen.
We are unique in that we have the ability to define pipelines, and deploy pipelines as Java: You can take one XML process and pipe it into another one, and design that all graphically, and then generate a Java program that implements that. We also have an XQuery debugger.
There are also plug-ins for XQuery into Eclipse. You can drag and drop data sources and create an XQuery that way. You can test that, and once you like the results, you can create an XQJ program, a skeleton Java program that executes that query.