Summary
Cliff Click claimed in his Azul blog recently that Just-in-time compiling Java bytecodes on a RISC chip almost always results in faster code that consumes less power.
Advertisement
In his blog, Cliff Click recently compared the tradeoffs of CPUs that execute Java bytecodes directly in hardware versus RISC CPUs that execute Java bytecodes using a Just-in-time (JIT) compiler:
The first point he makes is that a RISC CPU is much simpler to design than a CPU that executes Java bytecodes directly, and that affects the speed and power consumption of the CPU:
The hardware guys like stuff simple - after all they deal with really hard problems like real physics (which is really analog except where it's quantum) and electron-migration and power-vs-heat curves and etc... so the simpler the better. Their plates are full already. And if it's simple, they can make it low power or fast (or gradually both) by adding complexity and ingenuity over time (at the hardware level). If you compare the *spec* for a JVM, including all the bytecode behaviors, threading behaviors, GC, etc vs the "spec* for a classic RISC - you'll see that the RISC is hugely simpler. The bytecode spec is *complex*; hundreds of pages long. So complex that we know that the hardware guys are going to have to bail out in lots of corner cases (what happens on a 'new' when the heap is exhausted? does the hardware do a GC?). The RISC chip *spec* has been made simple in a way which is known to allow it to be implemented fast (although that requires complexity), and we know we can JIT good code for it fairly easily.
When you compare the speed & power of a CPU executing bytecodes, you'll see lots of hardware complexity around the basic execution issues (I'm skipping on lots of obvious examples, but here's one: the stack layout sucks for wide-issue because of direct stack dependencies). When you try to get the same job done using classic JIT'd RISC instructions the CPU is so much simpler - that it can be made better in lots of ways (faster, deep pipes, wide issue, lower power, etc). Of course, you have to JIT first - but that's obviously do-able with a compiler that itself runs on a RISC.
Click then concludes that except for very short-lived programs running on very small devices, a JIT compiler on a RISC chip will yield faster execution and less power consumption:
Now which is better (for the same silicon budget): JIT'ing+classic-RISC-executing or just plain execute-the-bytecodes? Well... it all depends on the numbers. For really short & small things, the JIT'ing loses so much that you're better off just doing the bytecodes in hardware (but you can probably change source languages to something even more suited to lower power or smaller form). But for anything cell-phone sized and up, JIT'ing bytecodes is both a power and speed win. Yes you pay in time & power to JIT - but the resulting code runs so much faster that you get the job done sooner and can throttle the CPU down sooner - burning less overall power AND time.
Hence the best Java-optimized hardware is something that makes an easy JIT target. After that Big Decision is made, you can further tweak the hardware to be closer to the language spec (which is what Azul did) or your intended target audience (large heap large thread Java apps, hence lots of 64-bit cores). We also targeted another Java feature - GC - with read & write barrier hardware. But we always start with an easy JIT target...