Agile Buzz Forum - BerkeleyDB for VisualWorks

BerkeleyDB is a neat little database built by Sleepcat Software which has recently been acquired by Oracle. BerkeleyDB is not a relational database or an object database, it is in fact just a regular old fashioned 'data' database. It works, essentially, as a key-value file on disk using different algorithms such as HASHing, BTree's, etc - which you, as a developer, get to pick and tune.

Over the years this database has picked up some pretty amazing tricks, such as the XA Transactions architecture, Replication, Secondary Database linking, etc. In fact, BerkeleyDB is so powerful that MySQL is built off it.

I needed to store word indexes on disk because I had too many of them to keep in memory, so I needed some place to put them that I could retrieve from fast - and be able to update the indexes fast too. This called for some sort of disk based database. BerkeleyDB was the perfect choice, suggested by a coworker, for this task.

I searched around and saw that once upon a time there was a BerkeleyDB implementation for Squeak, but that seems to have disappeared in to the netherworld? So, moving on, it was time to implement it myself in VisualWorks. As usual, the DLLCC header file parser was completely useless.

I build the structures and procedure definitions myself, which takes far too much effort, then wrapped up the instance based procedures, etc. I've published my efforts to Public Store under the name BerkeleyDB. It acts like a Dictionary (it's subclassed off KeyedCollection) so you can literally pretend it's just a regular in-memory Dictionary that runs slightly slowly because it gets stuff off disk.

To make sure it scales, I implemented the DB_MULTIPLE API's to a BulkCursor which is subclassed off the regular BerkeleyDB Cursor. This fetches data in chunks of 5mb's. If you need anymore than that, you can specialise it further by subclassing off my BulkCursor.

So, I've got Berkeley Cursor's in there too, for iterating over all the records. I've also got Stream API's in there for reading and writing to a record. These sit along side the regular Dictionary API as streamAt: key ifAbsent: [] and at: key putStream: aStream. BerkeleyDB lets you have up to 4gig of data in a single record and up to 256 terrabytes of data in the database all up. Very impressive.

I've not done any of the Replication, Secondary Database, Sequences, Transactions or Environment code - just enough so that I can treat a Hash or a BTree as if it were a Smalltalk Dictionary. Feel free to contribute further if you have a need.. though most of those functions are generally not required when making a simple disk db.

| hash btree |
hash := BerkeleyDB.Hash in: 'myhash.db'.
btree := BerkeleyDB.BTree in: 'mybtree.db'.
hash at: 'a' put: 'b'.
btree at: 'c' put: 'd'.
hash inspect.
btree inspect


	Web Artima.com