This post originated from an RSS feed registered with Agile Buzz
by James Robertson.
Original Post: Allocation on the stack and Continuations
Feed Title: Michael Lucas-Smith
Feed URL: http://www.michaellucassmith.com/site.atom
Feed Description: Smalltalk and my misinterpretations of life
I've talked a bit about Continuations and how they can make some really nicely optimised code. But one thing that Continuations doesn't solve is memory allocation. The continuation passing style compiler that was made in the "Compiling with Continuations" is still all heap based allocation.
Heap based allocation is as fast as stack based allocation - but - deallocation is a different story. Modern GC's are very fast, but even so, they cannot match for sheer speed simply subtracting a number off the stack register.
So, why don't we use the stack to allocate objects in Smalltalk? Perhaps some VM's already do. But I'll explain how I'd do it now any way.
First of all, all stuff allocated during your process will either be long lived or shorrt lived. The only way it can be long lived is to escape your process. It can do that by (a) being assigned to an instance or class variable, (b) or being placed in to an indexed memory area.
When that happens, we cannot leave the object on the stack any more. We have to copy it to the heap so any body who wants to access it can. The problem then is that objects still point to the stack's area instead of the heaps area for that object.
Do we dip in and make an object pointers collection that all code uses? No, I think we can void that for now (although, it has its own advantages). Instead of allocating just the object, we allocate a pointer first that points directly after it to the object on the stack. Things interested in the object use this proxy pointer to access the object on the stack. When we copy the stack object to the heap, we update the stack's proxy pointer to point to the heap instead of the stack.
This will greatly reduce the GC's work as most of the objects get GC'd naturally now.
What if we're using Continuations and we want to save a continuation to call it later. In this scenario, the objects on the stack do not need to be moved to the heap (all of them), but there is a consideration. What if a continuation is replayed from the start, after it got to the end and filled the stack with objects? Well, (a) if this continuation ran in its own process, the objects on the stack below the stack pointer must be move to the heap. (b) if the continuation is run from there, objects allocated will accidently overwrite what's already on the stack.
This effectively means we need two stack pointers. The allocation point on the stack and the fetch point on the stack. The heap already has this concept. But we've just told ourselves that we're using two registers for the stack, always, one register for the next continuation. That leaves us with six general purpose registers on an Intel x86. We'll have to keep track of that because it will become important as we go along.
Calling closures and methods that have >6 arguments requires creating a closure object as the arguments for the call. This can classically be looked at, in the scenario described in this story, as putting your arguments for the call on the stack - ala C style. But in actual fact we're making an object - the cheap way. It'll behave just like all the other objects in the system.
So the 'slow' way of calling, eg: the C way of calling, is still very fast (not as fast as using the registers).