I've done a lot of interfacing to C from Smalltalk over the last decade. In that time I've done it from VisualAge, Squeak, Smalltalk/X, VisualWorks and sundry other Smalltalks too.
And it hasn't gotten easier over time. My C interfacing has gotten more ambitious that's for sure, so that certainly doesn't help. But then again - neither does Smalltalk help.
For a language that's so good at metamodelling it sure is shyte at linking to an old language like C. I'm going to focus the rest of this blog post on VisualWorks because that's my current development focus.
As some of you know I've wrapped up libraries like BerkeleyDB in to VisualWorks. I'll use this library as an example in this post as well because it covers a lot of the problems with VisualWorks C Interfacing.
You can split this problem up in to a few different areas:
a) Parsing external interfaces automatically and producing stub code
b) Representing C constructs so that any kind of C call can be made form Smalltalk
c) Simplicity to then use the built library from Smalltalk
When it comes to (a), I've not seen other Smalltalks attempt to do this.. so kudos to VisualWorks - however, it would be better if VisualWorks' header file parser worked. I'll explain this in more detail below.
I've found too that many Smalltalks can do -most- of the C constructs, but not all of them. And then when it comes to point (c), it seems that most Smalltalks go out of their way to make it as difficult to interface as possible, even after building a nice interface library for you.
Let's go back to the header parsing problem. This is not an easy problem. The basic idea is that you parse a C .h file and rip out all the method definitions, typedef's, struct's, define's, etc.. everything you can get your hands on that means anything to you.
But there's a problem - it also has to be a pre-processor because VisualWorks doesn't assume you have a pre-processor installed on your operating system (or that one is even available). The biggest problem here is that VisualWorks' pre-processing is done badly. There are several define's that you need to get from a C compiler, platform, architecture, operating system, libraries etc - modern C programs rely on these define's existing to make sane choices about method signatures.
The interesting point here is that the .h's use of those variables is correct. If we don't parse the header file using those directives correctly, we'll get the wrong method definitions for the platform.
Aha, but we have a bigger problem. VisualWorks is a cross-platform product. It can have different platform source definitions running side by side in the same image. You don't "recompile" a program for different platforms in VisualWorks. That means that if we parse a header file that has different definitions for different platforms (very common), then we need to record that information.
This is not done. In fact, none of those platform specific defines are used at all in the VisualWorks C preprocessor. This isn't so much of a criticism, more to point out how difficult this problem is.
C preprocessor define's can be very tricky. They can say "On x86 Win32 with Windows < 5 -- you can do xyz". Basically, Smalltalk needs to record this fact and in some convenient and simple way convey this in the C Interface class it builds.
This is partially done by Smalltalk developers right now - we split out different platforms in to different C Interface classes. But we shouldn't have had to have done that. This leads us to the next problem.
Because VisualWorks is unable to satisfactorially parse header files, you have to build them yourself. That means if the header file changes - you have to figure out what changed and rewrite all that code. Nasty.. very nasty. It also means it's an unscalable approach to developing interfaces to other programs. A developer quickly tires of trying to keep up with C header file changes. This is a very bad outcome for VisualWorks.
Moving on - I've yet to come across a C construct that VisualWorks cannot represent and interface with. That's good, very good - as I got stuck with this sort of thing in VisualAge many times in the past. Perhaps it was a lack of understanding on my part, but it felt like some C constructs were either impossible or very difficult to interface with in VisualAge. Kudos to VisualWorks again for making "things possible". Next step will be to make "things convenient".
So what is the outcome of this - well pick a common C library or semi-common C library of your choice. You'll find that often there are interfaces to it from many different languages that come with the library. Usually you'll find you can use the library in C, C++, Java, C#, Tcl of all things and sometimes some other languages. You'll basically -never- find a library that comes with Smalltalk bindings too.
Why is this? Well, two reasons:
a) Every Smalltalk does C interfacing differently
b) If you change the header file between versions, you have a big maintanence job ahead of you.
So it's really not worth peoples while to go and interface to Smalltalk. Even ignoring (a) and assuming people will interface to the biggest Smalltalk - VW - they still have to contend with (b). (b) may not be as big an issue in other Smalltalks (though I'd bet it is!)
Moving on to the third difficult part of Interfacing to C. This is where the Smalltalk world and the C world meet. We have to consider things like speed here too - does the Smalltalk VM jit the C execution or does it have primitives? - if it just uses primitives, calling C may be impractical for some jobs. Tight loops calling C routines means it'll run slowly, so it's not going to be the best choice. If the VM jits the C calls though, tight loops calling C routines will be relatively fast and thus a good choice.
Using structures as Objects - ie: treating structure indexes as instance variables. This is really nice - unfortunately you won't find it in VisualWorks. There can be some 'at odds' moments when doing this with C structures though. If a Union is involved, what do the instance variables mean? -- well, *most* C structures fit nicely in to an Object paradigm so you could just deal with unions as an edge case.
Anyway, VW doesn't do this, so you can easily make a class to wrap up a Structure object. However, you have to -do- this. It's not generated for you. Again, this comes back to the core issue of automatically sucking in definitions. It would be nice if once it had automatically sucked in definitions, it automatically made classes for various structures and various helper methods to wrap up the C calls.
One common C paradigm that is often used is a pointer to a pointer, eg: SomeStruct **thing. This is really quite common and makes a lot of sense in the C world. However, in the Smalltalk world, you start to come up against the "easy of use" issue. Especially in VisualWorks.
VisualWorks is quite good at giving you a SomeStruct*, but if you then want a pointer to it - you need to go through several contortions. I'm not going to discuss them here today, but I may bring it up again in the future. Generally speaking, you end up wrapping these kinds of calls up in an inefficient method which "casts" and "allocates" a pointer before doing the call. Inefficient and now you have code you have to maintain that shouldn't even exist.
Then we get in to more interesting stuff. Some things VisualWorks has done very well - such as errors. Most platforms have an error code standard, eg: returning 0 means an error occurred, therefore call getLastError(). VW knows all these standards and you can switch which standard you want to use in the method definition. Cool. It gets better. If the call you're making is threaded (ie: spawn off another thread so VW doesn't lock up during the C call) then it'll make sure it makes that error check call in the launched thread.
Very useful. I love it - except that you have to define it in each method you're going to call. Grr.
Now let's get in to my latest favourite gotcha - memory spaces. Smalltalk moves its objects around as it garbage collects things. This can happen at any time. This even includes strings, bytearrays, big numbers, floats, etc. Nice gotcha there if you don't know what's going on. However, there is a savior to this problem in VisualWorks. You can allocate space on an external heap - a classic C style heap. You can also allocate objects on another heap called FixedSpace. FixedSpace is like a regular Smalltalk memory space in that it garbage collects - however, it won't move the objects around in fixed space. This essentially means you can get fragmentation - again, like a classic C heap in a way.
So if you're passing a float or a bytearray or a string to a C call, usually that call is not threaded, so the Smalltalk VM won't be executing while C is executing. There will be no garbage collection and thus nothing will move while the C call is running. However, if the call you're making is threaded - stuff may move while the C code is running. So VisualWorks smartly automatically copies objects in to FixedSpace when calling a threaded function.
Nice - however!.. if you're calling a C function and the parameter is a C structure that you've allocated in the external heap - but the values inside the C structure were allocated in regular Smalltalk space (eg: strings, floats, bytearrays), then you'll be shit out of luck and your program will mysteriously do "bad things". Why? Because there will be a delay between setting up your C structure and calling your C function. In that short period of time, Smalltalk may have moved your objects you pointed to in your C structure.
So what do you need to do? You need to make sure those objects in the C structure are in FixedSpace too. BUT! FixedSpace garbage collects, so you need to make sure you have a strong reference to those objects too. This can be tricky - so I've published StrongCompositionPointers to public store which holds on to a strong reference to fixed space copies of objects you set in C structures for you automatically. Problem solved there - once you know the problem exists.
Of course, there is a cost to all of this.. copying stuff to FixedSpace (you cannot copy it back btw, you have to let it be GC'd) for all of these C calls is potentially expensive. So it's not something you'd want to do in a tight loop. Generally speaking, we call C libraries for jobs that are expensive - such as doing some sort of IO, complex math calculation, rendering a 3d scene, etc. So spending a little bit more time setting up the call is not going to matter too much.
My main gripe here is that we have to write all this code just to call a C program.. why? Like I said, Smalltalk is the most powerful metamodelling language out there, yet we're still in the stone age when it comes to talking to other computer systems.
The fundamentals are missing in VisualWorks to do a good job here. The intent was clearly right, but the execution was lacking. I've talked previously on my blog about the lack of "Base C library interfaces". I'm sure it's now clear why said base C library interfaces don't already exist in VW, but to reiterate:
a) Too difficult to get the definitions in to Smalltalk to begin with
b) Too difficult to maintain once they're in Smalltalk and they change
c) No mechanism for handling the different C Preprocessing branches in definition files
Well.. that's enough of a rant for now. Perhaps its food for thought - personally, I'd like to put some brain power in to it and solve it properly, then kick off those base C libraries, then remove all the junk from the VisualWorks VM. But that kind of project takes time and resources. I'm not going to find time to do that sort of thing in my current jobs.