A great deal of important code, including many highly optimized libraries, have been written in C and C++ over the last several decades. Developers wanting to incorporate some of that code in client-side applications had either to reach outside the browser and resort to, for example, browser plugins, or port the desired C and C++ code to a browser-friendly language, such as JavaScript.
In keeping with the trend to enable polyglot programming on the client as well, Adobe's Scott Petersen has been working on a project that would enable a great deal of C and C++ code to run securely inside a browser, via the open-source Mozilla Tamarin virtual machine. Tamarin was donated by Adobe to Mozilla, and it will be the JavaScript execution environment for the Mozilla browser following Firefox 3. Tamarin is also the ActionScript VM inside Adobe's Flash Player.
The key concept behind Petersen's experiment involves compiling the C and C++ code into an intermediate binary format that can then be compiled into Tamarin bytecode. Because Mozilla has not yet fully integrated the Tamarin code base with its browser, Petersen used the Flash Player to illustrate his work by, for example, executing some well-known C and C++ programs, such as Quake, inside the Flash Player.
A special version of the GNU C Compiler—possibly llvm-gcc—compiles C code into instructions for the Low Level Virtual Machine.
The LLVM instructions are converted into opcodes for a custom Virtual Machine that runs in ActionScript, a variant of ECMAScript and sibling of JavaScript.
The ActionScript is automatically compiled into Tamarin bytecode by Adobe Flash, which may be further compiled into native machine language by Tamarin’s Just-in-Time (JIT) compiler.
The toolchain includes lots of other details, such as a custom POSIX system call API and a C multimedia library that provides access to Flash. And there’s some things that Petersen had to add to Tamarin, such as a native byte array that maps directly to RAM, thereby allowing the VM’s “emulation” of memory to have only a minor overhead over the real thing...
The end result is the ability to run a wide variety of existing C code in Flash at acceptable speeds. Petersen demonstrated a version of Quake running in a Flash app, as well as a C-based Nintendo emulator running Zelda; both were eminently playable, and included sound effects and music.
In an earlier interview, Petersen gave more details of his approach:
The LLVM compiler infrastructure is an amazing set of programs that provides all of the common command line C and C++ development tools like a compiler, assembler, linker, archiver, etc. However, instead of generating and manipulating platform specific assembly code and object code, these tools operate on a platform neutral “assembly language” and byte code...
I’ve created an LLVM backend that uses the same underlying mechanism as the platform specific assembly language generators for x86, ARM, PowerPC, etc. But instead of generating “real” assembly language, it generates low level ActionScript...
LLVM uses the GCC frontend for C and C++ so pretty much any C or C++ GCC can deal with can be dealt with by the tool. One might imagine that other languages supported by GCC such as Java, Objective-C, etc. could make their way into LLVM as well. The way LLVM is architected, it should require very little work (or none at all) to have ActionScript support for those languages after LLVM brings support.
What do you think of the possibility to reuse existing C and C++ code in a browser environment?
I'm always excited to see something other than Javascript in the browser. Not because Javascript is bad, but because each language makes different trade-offs, and sometimes I'd like a different set of trade-offs than the ones Javascript made.
Additionally, good C and C++ scripting environments could be used in other tools, like the "evaluate this expression" feature of debuggers.