|
Re: Sounds like Berkely DB
|
Posted: Dec 18, 2007 5:29 PM
|
|
In a berkeley DB style hash you can only retrieve by the keys. Amazon's SimpleDB lets you store structured data and search by any attribute.
Simply put, DBD is a hashtable, you store and retrieve by key only. The value can be structured, of course, but the software accessing it has to know what that structure looks like, and every record must have the same structure. Usually the keys must be unique, although there are implementations that allowed duplicate keys, if the application is able to handle them.
By contrast, SimpleDB stores a collection of 1..N key-value pairs in a record, and allows you to search for any of the values by any key.
Here's an example from Amazon's documentation:
PUT (item, 123), (description, sweater), (color, blue), (color, red) PUT (item, 456), (description, dress shirt), (color, white), (color, blue) PUT (item, 789), (description, shoes), (color, black), (material, leather)
"Amazon SimpleDB automatically indexes all of your data, enabling you to easily query for an item based on attributes and their values. In the above example, you could submit a query for items where (color = blue AND description = dress shirt), and Amazon SimpleDB would quickly return item 456 as the result."
Notice that each record can have a different structure. The only way BDB could store that data would be to pick one value as the key, say the item #, and populate the values with the rest of the pairs, and you'd still never be able to search on the data in the values, except by iterating over the entire hash and checking each record.
So, yes, this is VERY different from a BDB dictionary/hashtable, and much closer to a TupleSpace.
If you were sufficiently clever you could use Amazon's compute cloud to implement the eval() function and SimpleDB together for in() and out(), storing say Java bytecode as one of the items in the tuple. But the main difference is that Amazon's SimpleDB lets you do things like the classic Linda example:
out ('testdata', i, 3, 4+6)
in ('testdata', ?cnt, ?var, 10)
with something like
PUT (string, 'testdata'), (cnt, i), (var, 3), (sum, 4+6)
(Assume for the sake of argument that the actual code to implent this would know what i and 4+6 are)
and a query for (string = 'testdata' AND sum = 10)
|
|