Java Buzz Forum - Berkeley DB XML: Native XML Database With XQuery Support

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

The Name Game

The official name of the software is Berkeley DB XML, sometimes shortened to BDB XML. The file I downloaded from the website is called dbxml-2.1.8.tar.gz (23M in size), which extracts to a directory dbxml-2.1.8 (190M). However, if you search the web with dbxml, something other than Berkeley DB XML will show up. It is a bit of confusing situation.

Sleepycat Software, License

Sleepycat Software is the maker of Berkeley DB XML. They are better known as the maker of Berkeley DB, the widely deployed open source embedded database engine.

Berkeley DB XML is released under an open source license that permits its use in open source applications. Proprietary software vendors may purchase a proprietary license.

What's In The Bundle

The dbxml-2.1.8 directory contains the following:

[weiqi@gao dbxml-2.1.8] $ du -sh *
20K     buildall.sh
48M     db-4.3.28
49M     dbxml
16M     pathan
8.0K    README
74M     xerces-c-src_2_6_0
4.8M    xquery-1.1.0

Berkeley DB XML uses Berkeley DB for data management, Xerces-C for XML processing, and Pathan for XPath parsing and evaluation. It also includes an XQuery processing engine that is written just for Berkeley DB XML.

Building It

Berkeley DB XML can be built using the configure; make; make install method for each of the piece parts of the system. There is also a buildall.sh script that will do the whole thing for you.

I used this command line to build Berkeley DB XML:

./buildall.sh --prefix=/opt/dbxml-2.1.8 --enable-java --enable-perl

It took 26 minutes to build on my Athlon XP 2700+ with 1GB RAM. The result is 154MB of stuff in /opt/dbxml-2.1.8, including goodies such as these:

[weiqi@gao dbxml-2.1.8] $ ls bin
db_archive     db_dump      db_recover  db_verify   dbxml_load
db_checkpoint  db_load      db_stat     dbxml       dbxml_load_container
db_deadlock    db_printlog  db_upgrade  dbxml_dump  query_runner

[weiqi@gao dbxml-2.1.8] $ ls lib
db.jar               libdb_java-4.3.so       libpathan.a
dbxml.jar            libdb_java-4.so         libpathan.la
libdb-4.3.a          libdb_java.so           libpathan.so
libdb-4.3.la         libdb.so                libpathan.so.3
libdb-4.3.so         libdbxml-2.1.a          libpathan.so.3.0.1
libdb-4.so           libdbxml-2.1.la         libxerces-c.so
libdb.a              libdbxml-2.1.so         libxerces-c.so.26
libdb_cxx-4.3.a      libdbxml-2.so           libxerces-c.so.26.0
libdb_cxx-4.3.la     libdbxml.a              libxquery-1.1.a
libdb_cxx-4.3.so     libdbxml_java-2.1.a     libxquery-1.1.la
libdb_cxx-4.so       libdbxml_java-2.1_g.so  libxquery-1.1.so
libdb_cxx.a          libdbxml_java-2.1.la    libxquery-1.so
libdb_cxx.so         libdbxml_java-2.1.so    libxquery.a
libdb_java-4.3.a     libdbxml_java-2.so      libxquery.so
libdb_java-4.3_g.so  libdbxml_java.so
libdb_java-4.3.la    libdbxml.so

The Interactive Command Line Tool

After adding the appropriate directories/jar files into the PATH, LD_LIBRARY_PATH, CLASSPATH and ld.so cache, I can run the interactive command line tool called dbxml:

[weiqi@gao] $ dbxml

dbxml> createContainer foo.dbxml
Creating document storage container

dbxml> putDocument foo1 '<foo>bar</foo>' s
Document added, name = foo1

dbxml> query '
collection("foo.dbxml")/foo'
1 objects returned for eager expression '
collection("foo.dbxml")/foo'


dbxml> print
<foo>bar</foo>

I have just created a file called foo.dbxml, inserted an XML document into it, run a XQuery query against the database, and printed the results.

The download bundle include very professional looking documentation, from introduction to programmer's guide to API references. The documentations are also available on the web.

The Guided Tour is especially illuminating.

The C++ And Java APIs

The interactive tool is only meant to be the programmer's helper. The typical Berkeley DB XML application's users won't ever see it. The application can use the C/C++/Java APIs to manipulate XML documents in one or more .dbxml database files in a transactional and multi-threaded fashion.

The following snippet of C++ code does roughly the same thing as the interactive session above:

[weiqi@gao] $ cat foo.cc
#include 
#include "dbxml/DbXml.hpp"

int main(int argc, char *argv[])
{
  try
  {
    DbXml::XmlManager mgr;
    DbXml::XmlContainer cont = mgr.createContainer("foo.dbxml");
    DbXml::XmlUpdateContext uc = mgr.createUpdateContext();
    cont.putDocument("foo1", "<foo>bar</foo>", uc);
    DbXml::XmlQueryContext qc = mgr.createQueryContext();
    DbXml::XmlResults res = mgr.query("collection('foo.dbxml')/foo", qc);
    DbXml::XmlValue value;
    while (res.next(value))
      std::cout << "Value: " << value.asString() << std::endl;
  }
  catch (DbXml::XmlException& e)
  {
    std::cout << "Exception: " << e.what() << std::endl;
  }
  return 0;
}

[weiqi@gao] $ g++ -I /opt/dbxml-2.1.8/include -L /opt/dbxml-2.1.8/lib -o foo foo
.cc -lpathan -lxquery -lxerces-c -ldbxml-2.1 -ldb_cxx-4.3 -lpthread

[weiqi@gao] $ ./foo
Value: <foo>bar</foo>

Of course the second time this program is run, it complaints that foo.dbxml already exists.

XQuery Support

By far the most exciting feature of Berkeley DB XML, for me at least, is its support of the full XQuery 1.0 and XPath 2.0 languages. Although the W3C standardization process for XQuery has been very slow (I wrote about it 585 days ago), the power of the XQuery language has been recognized by the big three relational database vendors and a whole lot of native XML database vendors.

In addition, Berkeley DB XML also supports several XML storage strategies, indexing, metadata, W3C XML Schema validation, and an update syntax extension for the query language.

Got XML?

If you have a lot of XML documents laying around in the file system, why not pour them into a Berkeley DB XML container, add a few indices and some metadata, and query away?

Read: Berkeley DB XML: Native XML Database With XQuery Support

Previous Topic

Next Topic


	Web Artima.com