Agile Buzz Forum - Unoriginal ideas about objects, closures and hashes.

The following is not original, not in the slightest, but I thought it's worth re-iterating because sometimes these ideas come back around and talking about them can help them be realized in a context suitable for someones project.

I was having a discussion over dinner about the nature of objects and closures and how they are essentially the same thing. Patrick Logan brought up a cute old joke about a student and his master. The student excitedly exclaimed to his master, "Master! Closure are more powerful than objects!".. to which the master replied, "You fool, objects are more powerful than closures, go away and study." Many moons later, the student returns, "Master, you were right, objects are more powerful than closures," to which the master replied, "You fool, closures are more powerful than objects."

They're one in the same, but let's keep the analogy going for a moment. When I write the following code: concept I'm binding a name in to a local execution space. When I write the following code: x[concept] = "Foo" I'm binding a ma,e om to a space called x. It can be said therefore that a Dictionary or Hash object is the same as an execution space (or stack frame, if you so care). One is usually immutable or completely untouchable to the programmer (in smalltalk, you can modify the stack however you like) and the other is a common staple of programming (the key+value pairs).

Objects, too, bind names to a space, which is the object itself. In that way, a Dictionary or Hash is the same as an Object. In fact, some languages such as Javascript go so far as to make them the same thing. In Smalltalk, they are not the same thing - this is a pragmatic choice based on the technology of its era where it was possible to make Smalltalk run very fast by knowing ahead of time the shape of an Object rather than compiling shape information in to objects at runtime. Self did not even attempt to unbind this (you still had to declare your slots before you could use them), but Javascript did.

So that means that a Hash is an Object. We know that a Hash is a Closure and in this case we can also say that an Object is a Closure. In fact, if we decide performance isn't necessary then a Stack Frame is a Closure + Code Position, a Closure is an Object + Executable Code and an Object is key+value pairs. Calling a method means duplicating your closures hash, adding in the new parameters, pushing that hash on to the stack and jumping to the method. Hey now, don't jump out of your chair just yet: I did say this is if you're okay with dropping performance.

What this gains us is a uniquely simple system that contains only two data structures: Hash and Closure. The stack frame is inherit in the execution machinery plus the execution stack, however if you wish to do continuation passing style, you'd add the third structure which you'd call Continuation which includes the execution position within the executable code. The beauty of the continuation passing style is that you can do away with the stack completely, because the Continuation gives you a pointer to the closure to continue from.

The only thing we'd need to decide on is how we dispatch. In Javascript, any entry in your closure is something you can call, by referring to it symbolically followed by (), eg: myMethod(). This is a direct hashed name look up, in which case you're going to look in the current execution space's hash for something to look up and then execute; failure to find the named thing results in an exception.

In languages like clojure you can specify the matching rules for a function beyond just its name, so you can delve in to the structure of the parameters to the function you're calling and decide if the function matches. This is incredibly powerful because the predicates for a function can be bound to an entirely different execution space (the one at compile time) than the runtime, which means you never end up falling down the rabbit hole.

So given how simple this system is and the number of times I've referred to javascript, just how is it that javascript falls apart here? Well, for one, the stack frame doesn't exist to the programmer, continuations don't exist to the programmer and while functions are indeed closures, you cannot influence them that way - they have a prototype object you can access but this will apply to objects created from the function, which is sort of a cludge to allow fast dispatch by having light-weight classes called prototypes.

I think perhaps the better way is for the compiler to recognize constructs where frames are created as a result of calling a function and make it a template that can be memcpy'd easily. Of course, you'd also want the compiler to recognize function call arguments that, while technically are part of the closure, may never 'escape' that short execution path - these can be put on the stack as indexed accesses instead of named lookups. You'd have to be careful never to be caught out by the programmer of course.

The runtime environment, if smart, could also identify functions that access structures regularly and compile specialized versions of the functions that have indexed access to named things in the hash and make a light-weight variant of that hash that places some values in those same indexes. The consequence of this is that the programmer gets to think in terms of objects and closures but often those objects will never exist.

Concurrent starts to get interesting at this point too, because a closure is local and its data is local, always-copy is easy to implement especially if the closure is irregularly shaped, it is easy to copy the data to ship it off to another context guilt free of any kind of collision, erlang style. In fact, if the act of calling a function is to take a copy of the closure and add the new parameters to it and a copy of the closure means a copy of its data, then you've achieved another object-oriented tenant, which is encapsulation.I'm not sure how practical this would be as one of the advantages of a closure is to modify the state of things you can see within your scope that may not be directly local to you.

Smalltalk does not share the environment from the calling method because its environment is always bound to the object the method is installed on. You never compile functions inside functions like that in Smalltalk. You do compile blocks inside methods and in this case the behave exactly like functions/closures in javascript. That same behavior of "bound to an object" is achieved in javascript by using the prototype object of a function. When you send 'new' to a function in Javascript, you create a new object which is effectively like saying you're creating a new execution space which will have absolutely no conflicts with your current execution space.

Of course, I'm twisting everything around, back to front, trying to dispel the mystique of object oriented programming. I'm deliberately voiding the OO concepts just to see what is and is not real. When we program in the way described above, we lose polymorphism, but if we have flexible dispatch mechanics then we can achieve polymorphism by reusing common names but discriminating by types - however, we really have no types, so our discrimination has to be much more evolved and it would be easy to make a function that was too liberal in what it accepted. This is definitely an advantage of a typed system - it's easy to determine just what it is you'll be calling at compile time rather than runtime.

What about other primitive types? arrays? strings? those sorts of things.. well arrays and strings are efficient hashes, where the indexing system directly corresponds to the memory layout (actually, that's a lie, if you ever want to implement a UTF8 string you'll understand what I mean). So clearly the mechanism for looking up something must be programmable, otherwise how would you ever make structures that looked (in memory) different to a basic key+value hash.

The answer is in the closure. Since a closure has executable code already, it has to have some known structure: the hash and the executable code are two slots that must be recognizable. Two more pieces of information it must have for this entire scheme to work: getter and setter. In Javascript terms these would be [] and []= and in Smalltalk terms these would be instVarAt: and instVarAt:put:, except that these two methods will be called at runtime to do lookup within the hash. So it is clear that the hash must have the ability to modify itself, our data structures now look like this:

Hash {
void *data,
Closure *getter /*void *(*getter)(void *symbol)*/,
Closure *setter /*void *(*setter)(void *symbol, void*data)*/
}

Closure {
Hash *hash,
void*(*function)(...)
}

These are the generics of course, the compiler (or runtime optimizer) might create other more interesting structures on the fly to aid in making fast running code. Whenever you have a generic you can make something concrete. But we now have the ability for the programmer to create new kinds of Hash such that we could implement a basic ascii String as:

function AsciiString {
string = {};
data = string.data;
string.getter= function(offset) { return memory[data + offset]; }
string.setter= function(offset, value) { return memory[data + offset] = value; }
return string;
}

It's key to be able to write primitive data structures using generic data structures in such a way that they can eventually become self optimizing. In this case, we can create an AsciiString if we have functions available to use for using system memory at the data pointer. The default getter and setter functions would do a key lookup, so they would be more involved, but could also be implemented in terms of the same language (however, they are redundant). The only primitive here is the getter and setter for the memory object, which would either actually be a primitive in the virtual machine or be assembly code that has been compiled (preferably).

There are still a few other loose ends here, such as how you deal with atoms themselves - eg: the Closure, or even just a simple int32. Solving all those bits and pieces to create a full language wasn't my goal here, my goal was to unify some of the core concepts that we use in many programming languages. I hope that I have done so:

Hash, Object, Closure, Stack Frame, Continuation: they are all connected.


	Web Artima.com