Okay so I've been putting a few brain cycles in to the problems of VisualWorks' C Interface - DLLCC.
This is a continuation from my critique of the system in the previous post. Now for the good news - I think there's definitely light at the end of the tunnel. I've got part of the solution sloshing around in my brain.
The key to designing is to let the bits slosh around and then settle in such a way that you have to write the least amount of code to get the maximum impact from your design. I've not quite reached that stage. I'm hoping to get down to almost no work at all :)
Okay, so this is how it goes so far. DLLCC marries the idea that libraries are supported by header files. This is a reasonable assumption - sometimes two or more header files comprise the functionality of a library. Sometimes they comprise bits of a libraries functionality and you want to keep them segregated - eg: big chunks of the various windows platform dll's.
But what we cannot ignore are the pre-processing instructions. We want to be able to assemble a "view" on a C library based on configured variables at runtime and be able to have more than one of these views side by side in the running system. This is what puts Smalltalk apart from other systems which will happily accept only one view of how a C library works.
To that end - and given that in DLLCC now you make instances of interfaces - it's clear that what we need to do is model the processing of header files. We want to take all those pre-processor if statements and turn them in to Smalltalk code that can then correctly assemble a "view" on the definitions in the header files in to an instance of an interface linked to an actual .dll or .so file (or the MacOSX equiv of which I've forgotten its file extension).
That seems pretty obvious - and the actual pre-processor instructions are quick to evaluate (so long as you're not trying to solve the towers of hanoi with the c pre-processor!). The definitions inside the ifs can be parsed using DLLCC's current parsing code. It seems to get that stuff write-enough for now. So let's not try to make the problem too big.
Unfortunately this approach has a draw back. We must evaluate the system of libraries and their header files for a bunch of variables - sounds good - but what do we need to get out at the other end? An instance that responds to message sends. That means we need to make a class and then instantiate it. And since it's not a real class you want to keep around (because every time you change your variables, the class definition will change too) you're suddenly not going to be able to look at it easily in the Refactoring Browser.
One thought I had here was to make a real class - stick it in the (none) package so that people don't publish them - and give it a reasonably obvious and unique name. That way developers can browser the generated interface - but I can also make it so that once the last instance of the interface goes away, we can delete the class.
Another tool that might be useful would be to let the developer name the class they want to produce from the evaluation of the pre-processor instructions. That way they can keep the class and do what they want with it. That would be an alternate way to use the system. It's the equivalent of what you have now with DLLCC except that you didn't have to write it all yourself.
If this approach works correctly, it should mean that you never write C interface code yourself. It should also mean that you always get the same predictable output from parsing C headers. That solves the maintainence problems associated with most C interface paradigms right now.
The next step would then be to try it out with the core Linux libraries and Windows platform libraries. If it can handle them correctly, we're on the right track.
The next step will be to do two things:
- Make the approach scalable. We want to split the Windows platform libraries up in to different logical packages - otherwise you'll have 54mb's of C interfaces just to use anything in Windows. This is clearly not a good thing. No one is always using all bits of Windows any way.
- Make further structure Classes and wrapper methods. This'll be interesting because it allows us to make nice-looking method selectors for C methods and nice classes that are object-oriented. Many C libraries will take a struct as a parameter - such methods can be wrapped up on to the structure classes too giving the whole library a very OO feel. Once again, this would be generated consistently to avoid heavy maintainence costs.
After that, I'd like to then try replacing some of the VM primitives with the equivalent code copied from the VM source in to Smalltalk code using these new shiny libraries. I imagine that the migration from VM code to Smalltalk code can be quite gradual and over time we'll be able to lob large chunks of the VM off. This would be ideal for everyone (except for the guy who wants to open up a million socks in a tight loop - but hey, that'll run slow in C any way ;>)
So what would be the first step in all this? Actually it's not a hard step - it's a matter of being a C pre-processor. Being able to parse an input stream as a C pre-processor. This could be very useful in general - it's not specifically a C thing any way. People use the CPP for all kinds of non-C jobs.
The next step would then be to poke the 'islands' of definitions between CPP statements with the DLLCC parser.
The step after that would be to generate Smalltalk code which glues together the correct definitions in to a C interface class upon executing the CPP 'rules' tree with a set of environment variables.
Another step after that will be to derive said variables from the operating system we're running in - eg: setting the correct architecture define, operating system define, etc. Other variables such as whether we're pretending to be GCC or MVC will depend on the user of the systems preference.
Those three steps will be enough to actually try the thing out. They're not too bad really.. I wonder if I'll get time to try them.