This post originated from an RSS feed registered with Agile Buzz
by James Robertson.
Original Post: More Prevayler
Feed Title: Michael Lucas-Smith
Feed URL: http://www.michaellucassmith.com/site.atom
Feed Description: Smalltalk and my misinterpretations of life
Now that Prevayler is working enough that people can start bashing at it, I've put my brains to further ideals that it could strive for. Note: none of these things are implemented, they are just ideas at the moment.
Virtual Object Memory
This code base will never be a real database. But it could turn out to be a neat alternative. The current design of Prevayler states that all objects are in memory. What if you have too many objects? You need a way to partially have all the objects in memory. To that end, the recovery code of the current code base is lazy - as you access things, it reads it from disk. But it opens up some bigger issues that need to be resolved if this is to work as Virtual Object Memory.
Transactions
When making changes to objects, if you are within a transaction, any newly introduced object to the transaction should be preserved in its original state so that if the transaction is rolled back, the original values are reinstated.
[aPerson fullname: 'New name for the person'.
self rollback] inTransaction
Isolation
Whenever you access an object, you do so through Proxies, so when you send a message to the proxy the real object need not be notified. A copy of the real object can be made within your Transaction and changes can be made to it. This means other users of the same object will not see your changes until your object is committed.
Some extra's that can be added in here is that any object you access within the transaction is also preserved to keep a consistent view of your data while doing the transaction. A logical extension from there would be last-modified timestamp locking to ensure things aren't out of date upon commit.
Memory Releasing
Since we have a Proxy object that contians the real object, we in theory should be able to release the real object and have the Proxy reinstantiate it when required. The problem here is, any objects that our object referenced will then get GC'd.
What we need to do is mark on them that they have just been free'd, not dereferenced. Then when they get GC'd, they won't get wiped off disk. Any object that is getting GC'd that has this flag will pass it on to its children.
On the flip side, if you dereference the object, the flag must be reset so that if it is GC'd it will get wiped off disk. This leads in to the Garbage Collection issue...
Recovery Garbage Collection
At the moment, if an object is derefenced and has no other references from other objects, it will be garbage collected. The garbage collector will notify Prevayler which will wipe the object from disk.
This turns out to be a big problem if you have more than one reference to the same object. If object A and B reference C and during recovery you have only read in object A. Object A then suddenly no longer references object C, C will be wiped - if you then read in object B, it is missing its C value.
This can be solved by traversing the object tree of what you reference and who references you from the object C as A dereferences it. This will cause B to read in and therefore there will be a link to it - it won't get GC'd.
You can easily imagine that there could be a very large object tree associated with such a 'junktor'. So instead of reading the objects in to memory, they can be read temporarily to see what else they reference, then dropped. This continues until one object is referenced back to something in memory - this may end up being root. If nothing ends up referencing these id's from memory, all the id's collected get wiped from disk.
This could either be a short process or a very lengthy one. So to avoid faults from interruption, this entire process needs to be written to disk as it goes. That way, upon recovery, any such mark+sweep style collecting processes can be resumed safely.
Indexes
And finally, we can start indexing the system. Because multiple reference virtual garbage collection is solved, we can have collections that point to proxies that are 'weak'. They will not count as a hard reference and thus allow objects to become garbage in a natural manner.
But indexes in Smalltalk can be quite smart. Any Block of code can be used to identify if an object should be in your index. Such objects get marked in the IndexCollection. To retreive objects in your index, simply iterate over the IndexCollection like any normal collection.
This allows the developer to be quite specific about the kinds of indexes they are interested in.
The trick here is to ensure that if an object is GC'd and wiped from disk, it is also removed from the index.