Summary
Efforts to make non-Java languages perform well on the JVM accelerated in recent years. The benefits of turning the JVM into a highly-optimized, general-purpose execution environment are many, but so are the challenges.
Advertisement
This week's most talked-about Java news was the decision by Sun to hire the two key figures behind the JRuby project, with the ostensible goal of creating a first-class Ruby implementation on the JVM (see the Artima interview with the JRuby project leads, Sun's JRuby Move). In an email to Artima, Tim Bray, Sun's director of Web technologies, noted that:
Looking back ten years, it might have been really smart, at the birth of Java, to brand the platform and the language separately. But during Java's early years, the technology was hitting such a big sweet spot that it was easy to see the whole thing—VM, libraries, and language—as a single engineering triumph. Microsoft was smart to get out there and evangelize that a virtual machine and API repertoire aren't necessarily tied to a language. On the engineering front, we've been pretty serious about going multi-language for some time now, [for example,] with the work on Rhino and the proposed new dynamic-method-dispatch bytecode.
Robert Tolksdort's Languages for the Java VM page currently lists close to 200 languages that can be executed on the JVM, in addition to Java itself. Non-Java languages on the JVM include anything from scripting languages, such as JavaScript or Ruby, to a whole array of Lisp-like languages, as well as Visual Basic, COBOL, and assemblers that directly target JVM bytecode.
Performance, Libraries, and Politics
As Tim Bray alludes to, not only has the number of languages targeting the JVM been increasing, but so has the quality of many of those implementations. Indeed, many languages that target the JVM as an execution environment may find that the JVM can execute code written in a target language better in some cases than natively compiled code can. Here, again, is Tim Bray:
Currently, native Ruby runs mostly in interpreted mode. If we arrange for JRuby to be compiled into Java bytecodes, it'll be running on the JVM, which is one of the world's most heavily-optimized pieces of software. So JRuby might end up having a general performance advantage.
More specifically, Sun is leading the charge toward highly-parallel multicore computing with the T1000 and T2000 "Coolthreads" chips, which are really well-suited to server-side Web apps. The native Ruby implementation of threads is fairly limited and may not take good advantage of this kind of CPU. JRuby uses native Java threads, which are very highly tuned; so in the particular case of highly-threaded parallel code, there's a pretty good [chance] that JRuby will be a performance winner on modern silicon.
In addition to improved performance, another advantage of executing non-Java code on the JVM is that that non-Java code in most cases will be able to take advantage of the huge array of Java class libraries. Thomas Enebo, one of the JRuby project leads, noted that,
Java has a huge corpus of libraries... In some semblance most libraries you can think of have already been implemented in Java, usually as an open source package. JRuby allows Ruby to access any Java class and interact with it as if it was written in Ruby. This means a Ruby programmer has a much larger toolbox at their disposal.
Interaction between Java and non-Java languages inside the JVM works in both ways. When Tor Norbye demonstrated Project Semplice, allowing Visual Basic code to run on the JVM, he also pointed out the benefit of having a JSF page invoke a VB component running inside the JVM:
This compiles the BASIC file down to a Java bytecode class, which is located and instantiated by the JSF managed beans machinery at runtime. As a result, the application works and the JSF framework has no idea it's talking to BASIC code.
Having the ability to execute many types of non-Java application on the JVM can bring a certain level of freedom to developers, since a primarily Java-centric enterprise IT shop may let you code up your app in, say, Ruby on Rails, and then simply run that app inside a highly available, clustered, and possibly even virtualized JVM environment.
With the recent activity around supporting non-Java languages on the JVM, talk about the productivity benefits of dynamically typed languages, and with a naturally occurring fatigue that typically sets in with almost any technology or language with time, it may be an opportune moment for the Java powers-that-be to tweak the JVM and position it as a high-performance, general-purpose execution engine. In other words, it may be time, as Tim Bray said, to "brand the language and the platform separately."
Such a move would pit the JVM as a competitor against Microsoft's Common Language Runtime (CLR). Both platforms could converge to being about execution, and not primarily about language (Microsoft has already positioned the CLR that way). These platforms would then compete in providing a sophisticated array of execution facilities to code written in various languages.
How Well Can the JVM Handle Non-Java Code?
Given that both the JVM and the .NET CLR are Turing-complete, it should be possible to execute any program that runs on the CLR on the JVM as well. However, it is apparent that making many non-Java languages perform well on the JVM—or on the CLR as well—is not a trivial pursuit. That's partly the result of mismatch between key constructs in the JVM and non-Java languages. About implementing Ruby on the JVM, Charles Nutter noted that,
For much of Ruby we've had to implement a "VM on top of a VM" that bridges that gap. We do not have control over the Java stack...so we maintain our own. We do not have a dynamic invocation bytecode in the JVM...so we use our own method. We don't have support for closures...so we simulate them with movable scopes and command implementations. However our recent efforts have aimed toward componentizing these pieces; as the JVM evolves to support them, we'll be able to toss them out one by one.
Thomas Enebo, the other JRuby project lead, added that,
We have to emulate a set of language semantics which do not map well with the JVM's underlying design. Those language semantics are sometimes quirky and reflected some evolutionary set of changes which need to be properly reflected in our implementation. We are getting pretty close to matching parity with the C implementation, but some of the last cases will be a challenge.
In a blog post earlier this year, Non Java Languages on the JVM, Debasish Ghosh summarized some of the technical challenges implementing a language on either the CLR or the JVM, noting that,
There are still confusions regarding what should be parts of the language and what should be supported at the VM level... However, ... it is just a question of the ease of implementation and use and the speed of execution on the VM platforms.
Gosh specifically highlighted four challenging areas in implementing dynamic languages on the JVM. The following are excerpts from his blog post:
invokedynamic in JVM Dynamically typed languages like Python, Ruby etc. perform method dispatch by name, and not by type - invokedynamic-enabled JVM will ensure "that the verifier won’t insist that the type of the target of the method invocation (the receiver, in Smalltalk speak) be known to support the method being invoked, or that the types of the arguments be known to match the signature of that method. Instead, these checks will be done dynamically."
Hotswapping The main idea is to allow code changes on the fly, while they are running. The full capability of hotswapping implies any kind of change to be supported, addition/modification/removal of methods and attributes including changes in inheritance hierarchy.
Tail Calls Functional language programmers use recursion to implement loops - however elegant it may look, recursive calls consume lots of stack space. Hence these languages employ all sorts of optimizations to make efficient loop implementation possible with constant stack space. Tail call optimization is one such technique, which replaces calls in tail position with jump statements. A call is said to be in a tail position if it is the last statement of a function...
Language implementers want tail call support in the JVM. This is not as simple as it may seem... Various techniques have been proposed and used in the last decade or so for generic tail call optimization. But none of them have been found to be suitable for an efficient implementation on the JVM.
Continuations Just think of [it] as yet another semantics of function call implementation, where instead of returning value to the calling function, the parent function tells the child function which function to pass the result to. No function call ever returns. The child function does this with an object named Continuation, which it gets from the parent. A continuation is an object which takes a snapshot of the current function's lexicals and control stack - when invoked, the complete state is restored from the calling chain...
I think the biggest challenge of implementing continuations support in the JVM is to follow the principle of "pay for it only if you need it", since not many languages actually need them... Once again, the real challenge is stack management.
Overcoming these technical challenges may be possible with sufficient resources. Indeed, under the hood, some JVM implementations are better prepared to handle some of these features than others are. For instance IBM's JVM seems to be able to handle recursive tail calls fairly well (see the IBM DeveloperWorks article, Improve the performance of your Java code ).
To what extent do you believe the effort to make the JVM a general-purpose runtime is worth the investment? If so, what will that mean to non-JVM implementations of languages, such as Ruby? If both the JVM and Microsoft's CLR evolve in the coming years into truly high-performance, secure execution environments for commonly used languages, what will that dichotomy mean to developers?
I find it amazing that we come up with all these high level languages to achieve better portability and now, all of a sudden, we think that low level software like a VM can better achieve portability. Am I the only one who sees the irony here? It's like we're trying to go back to assembler or something. I don't remember assembler being portable. If you want portability, you have to go higher, not lower. That's why they're having problems. It's expected. And Turing completeness has nothing to do with it. A VM is an emulator and should be used as such.
> I find it amazing that we come up with all these high > level languages to achieve better portability and now, all > of a sudden, we think that low level software like a VM > can better achieve portability.
I wouldn't say it happened suddenly. Vitual machine techniques for programming languages are quite some time around, aren't they?
> Am I the only one who sees the irony here?
Probably yes.
> It's like we're trying to go back to > assembler or something. I don't remember assembler being > portable. If you want portability, you have to go higher, > not lower. That's why they're having problems. It's > expected. And Turing completeness has nothing to do with > it. A VM is an emulator and should be used as such.
The VMs we are talking about are abstract machine models not emulators. Whether they are dedicated to emulate anything or not is due to their particular specification.
> I find it amazing that we come up with all these high > level languages to achieve better portability Still portability part was encoded in back-ends of compilers, where internal representation of code may be something like this CLR stuff. It's only natural that this level of representation finally got out as various standards.
> I find it amazing that we come up with all these high > level languages to achieve better portability and now, all > of a sudden, we think that low level software like a VM > can better achieve portability.
These VMs are not here to be programmed directly in their byte-code language (although it's possible), they are here to provide a standard machine that can be ported to various platforms, so that programmers don't have to write various versions of their programs for Windows, Macs, Linux, Unix, etc. They also provide better security.
> Am I the only one who sees the irony here?
Maybe :)
> It's like we're trying to go back to > assembler or something. I don't remember assembler being > portable. If you want portability, you have to go higher, > not lower.
That's why CLR (and JVM) are here: You write a program in a language (C#, C++, Java, etc.), compile it to the VM instructions, and you can run it on any platform for which a particular VM is implemented.
> That's why they're having problems. It's expected.
If you're talking about the problem Sun has with extending JVM to support multiple languages it's because JVM was originally designed just for Java. In general, you would have similar problems with any kind of software that was originally designed with different requirements.
> And Turing completeness has nothing to do with > it.
In some fundamental way it does because you want the VM to be able to run any algorithm. But the issue here is not Turing completeness, it's more the support for different programming styles (paradigms) for which JVM was not designed originally.
> A VM is an emulator and should be used as such.
Depends on your definition of "emulator". As far as I know, emulators are programs that simulate a real machine and/or platform (for example, a Windows emulator).
if memeory serves, when the CLR and .NET came along, the hordes of VB coders rebelled. they did so because the "version" of VB that would run on the CLR was nothing like the VB5 they had come to know and love. and if memory serves, M$ stated flatly that the semantic and syntax changes were deliberate, irrevocable, and had to be done in order to get some kind of VB that would run on the CLR.
having any VM (and interpreter is a synomym) does not mean any source language can run efficiently on it. just ask any mainframe COBOL programmer whose greased lightening code is dog slow on any microprocessor. extra points for why.
> if memeory serves, when the CLR and .NET came along, the > hordes of VB coders rebelled. they did so because the > "version" of VB that would run on the CLR was nothing like > the VB5 they had come to know and love. and if memory > serves, M$ stated flatly that the semantic and syntax > changes were deliberate, irrevocable, and had to be done > in order to get some kind of VB that would run on the CLR.
If you have looked at VB.NET, it appears (to me at least) to just be C# with some VB-like syntactic sugar.
You could fully support VB5 on the CLR (Microsoft's JVM), but it would have to emulate a few things that the CLR doesn't support, and that would dramatically cut into the efficiency of VB.NET, in theory at least.
"CleanJ is a project for running Concurrent Clean programs on Java VM. Concurrent Clean is a pure and non-strict functional language whose syntax is similar to that of Haskell."
The JVM is quite tightly bound to the Java language. Class identifiers, type identifiers, method invocation primitives, class notions, these are all hardwired to Java's notions (and Java's notions about what constitutes a 'class' are mighty peculiar in the OO universe).
So its no surprise that other languages don't "live happily" on the JVM. They take performance hits, or have to leave out key features (continuations in JRuby).
Dynamic invocation, which is how some languages do all method dispatch, is insanely poor on the JVM with a lookup of a Method object and then invoking it taking about 40 times as long as the usual Java method invocation (which are more or less firmwired at compile time). I have been able to get it to a factor of about 1.5 to 2 by using homegrown inline caching, but this is exta work on my part.
If Bracha gets is invokeDynamic, this will help some, but there remain a lot of Java-isms baked into the VM that make it a drag for hosting other languages.
About the original question, whats it worth to have a JVM with good support for multiple languages, I really think it's worth a lot. Of course, I want to be able to use any good language without any performance penalties... But I think that the performance problems are something you could overcome with an investement that will repay itself (maybe not to Sun, but to the software industry). To be able to use the right tool (right language) for the job would make some things much more efficient. It would also mean that knowledge about the class libraries would be worth something in more projects and with more languages. That in itself should mean that it will get easier to get developers up to speed with other language than Java. And knowing that there are a lot of large projects that takes thousands and thousands of hours to complete, a few hours of language training in the beginning should not be a showstopper.
> If memeory serves, when the CLR and .NET came along, the > hordes of VB coders rebelled. they did so because the > "version" of VB that would run on the CLR was nothing > like the VB5 they had come to know and love.
with classic Visual Basic the even numbered releases were the major ones:
VB2 - introduced visual editing of GUI controls to the masses
VB4 - entered the 32-bit field, got a new language engine called VBA to support any office macro need (enabled to script any EOM, WOM and the like [ 'E'~Excel, 'W'~Word, etc. ]) and became kind of adult in connecting with COM that had become strategic for MS
VB6 - did target as a stepping stone the 'unification' with C++ in terms of IDE (Visual Studio) ... now later this path was declared quite abruptly a dead end [ Anders Hejlsberg had entered the building (and he was aware of things like Python too ...) ] ... and the former mentioned masses were not amused by this next offering with curly braces all over and this habit to semicolon almost each line of code
VB5 was just a rush release in the name of the marketing term ActiveX countering what was labelled an applet on the other side of the divide
September 2006: what is MS planning to do with IronPython ?
trying to bring back home those lost masses of classic Visual Basic, lost perhaps to PHP or other easy going vehicles ... ??
I'm glad to see this article--the discussion about languages on the JVM is more interesting focusing the particulars such as those you list as opposed to the interests of market share and language leadership (e.g. Ruby vs. Java, Python vs. Java). I'll add that we probably have such good JVMs precisely because the JVM was well-specified and the Java language well-constrained enough that well-known and well-proven optimizations were possible. Making a JVM optimized for all cases may make it worse for all cases...
> I'll add that we probably > have such good JVMs precisely because the JVM was > well-specified and the Java language well-constrained > enough that well-known and well-proven optimizations were > possible.
Its easy to be fast when you don't have much to do. The JVM is be severly prematurely optimized and the other word I have for "well-constrained" is feature-poor.
I'd also point out that the JVM is fast because Sun bought a Smalltalk company with advanced VM technology - a language requiring a VM considerably more feature rich than the JVM is now (in terms of support for language constructs) and then applied this advanced technology to their little tin lizzie. The optimizations applied to the JVM are at least 20 years old and well understood. Polymorphic inline caching with JIT compilation have been around for some time.
> Its easy to be fast when you don't have much to do. The > JVM is be severly prematurely optimized and the other word > I have for "well-constrained" is feature-poor. > > I'd also point out that the JVM is fast because Sun bought > a Smalltalk company with advanced VM technology - a > language requiring a VM considerably more feature rich > than the JVM is now (in terms of support for language > constructs) and then applied this advanced technology to > their little tin lizzie. The optimizations applied to the > JVM are at least 20 years old and well understood. > Polymorphic inline caching with JIT compilation have been > n around for some time.
And yet...ongoing optimization of the JVM is keeping programmers at Sun, IBM, BEA, and Excelsior (just to name four) busy, and busy for years now. The performance of those VMs increases with each release (if less now than before). That leads me to believe (assuming that there are some good programmers scattered about the respective efforts) that not all is known about JVM optimization, even if purchases were made and some optimizations were already well-known.
In the end, the quality of software rests largely with the programmers who write it, not the companies who hire them. I could care less if Sun had already great programmers in-house 10 years ago or if they hired them for somewhere else. I assume both are true.
Patrick
Flat View: This topic has 30 replies
on 3 pages
[
123
|
»
]