Berkeley DB XML has been on my radar screen for a long time now. It was mentioned recently in one of the meetings I attended. So I think I would take a deeper look. Here's my first impression report:
Wow! Cool! Yeay! Way to go! Well done!
The Name Game
The official name of the software is Berkeley DB XML, sometimes shortened to BDB XML. The file I downloaded from the website is called dbxml-2.1.8.tar.gz (23M in size), which extracts to a directory dbxml-2.1.8 (190M). However, if you search the web with dbxml, something other than Berkeley DB XML will show up. It is a bit of confusing situation.
Sleepycat Software, License
Sleepycat Software is the maker of Berkeley DB XML. They are better known as the maker of Berkeley DB, the widely deployed open source embedded database engine.
Berkeley DB XML is released under an open source license that permits its use in open source applications. Proprietary software vendors may purchase a proprietary license.
Berkeley DB XML uses Berkeley DB for data management, Xerces-C for XML processing, and Pathan for XPath parsing and evaluation. It also includes an XQuery processing engine that is written just for Berkeley DB XML.
Building It
Berkeley DB XML can be built using the configure; make; make install method for each of the piece parts of the system. There is also a buildall.sh script that will do the whole thing for you.
I used this command line to build Berkeley DB XML:
After adding the appropriate directories/jar files into the PATH, LD_LIBRARY_PATH, CLASSPATH and ld.so cache, I can run the interactive command line tool called dbxml:
[weiqi@gao] $ dbxml
dbxml> createContainer foo.dbxml
Creating document storage container
dbxml> putDocument foo1 '<foo>bar</foo>' s
Document added, name = foo1
dbxml> query'
collection("foo.dbxml")/foo'
1 objects returned for eager expression '
collection("foo.dbxml")/foo'
dbxml> print
<foo>bar</foo>
I have just created a file called foo.dbxml, inserted an XML document into it, run a XQuery query against the database, and printed the results.
The download bundle include very professional looking documentation, from introduction to programmer's guide to API references. The documentations are also available on the web.
The interactive tool is only meant to be the programmer's helper. The typical Berkeley DB XML application's users won't ever see it. The application can use the C/C++/Java APIs to manipulate XML documents in one or more .dbxml database files in a transactional and multi-threaded fashion.
The following snippet of C++ code does roughly the same thing as the interactive session above:
Of course the second time this program is run, it complaints that foo.dbxml already exists.
XQuery Support
By far the most exciting feature of Berkeley DB XML, for me at least, is its support of the full XQuery 1.0 and XPath 2.0 languages. Although the W3C standardization process for XQuery has been very slow (I wrote about it 585 days ago), the power of the XQuery language has been recognized by the big three relational database vendors and a whole lot of native XML database vendors.
In addition, Berkeley DB XML also supports several XML storage strategies, indexing, metadata, W3C XML Schema validation, and an update syntax extension for the query language.
Got XML?
If you have a lot of XML documents laying around in the file system, why not pour them into a Berkeley DB XML container, add a few indices and some metadata, and query away?