Summary
Current closures proposals for Java still think in terms of serial execution of code, according to Elliotte Rusty Harold. Multicore CPUs will make it likely that closures inside a loop execute on different CPU cores. Harold suggests that a closures syntax should make the possibility of parallel execution explicit.
Advertisement
In a recent blog post, Why Hate the for Loop?, Elliotte Rusty Harold draws attention to the dangers of basing Java closures too closely on closures in other languages. Harold points out that closures and other types of code blocks, such as blocks in a for loop, should not be confused, because a for loop implies serial execution, whereas a closure does not.
Harold's comparison starts with an oft-quoted use of closures in Ruby:
3.times {puts "Inside the times method."}
Harold compares this syntax to the traditional for loop with an index, noting that:
Indexless loops don’t have to be serial. That is, there’s nothing in a statement like 3.times {doSomething()} that promises any particular order of execution. In fact, just maybe we can do all three actions at the same time. This enables parallel processing, and is going to be very important as multicore processors and multi-CPU systems become even more common...
The for syntax implies serialization where you may not need it. The closure syntax doesn’t necessarily guarantee the order of execution of the various statements. However, sometimes you actually do need a particular order of execution, or you need to refer to the loop index from within or outside of the code.
Harold notes that closures, if added to the Java language, should have a more explicit syntax about serial or possible parallel execution of a block of code:
The current proposals for closures in Java all seem to still have sequential execution of code. For instance, the BGGA proposal makes a big point out of allowing break and continue inside closures, but what does that mean if the different iterations of the loop are in fact running on different processors at the same time?
If the code is going to be sequential anyway, I prefer the style that makes that more obvious. The traditional indexed loop does that. A closure doesn’t.
To what extent do you think new language constructs in Java, or in other languages, should take into account the possibility of parallel execution of code?
Good point. In the Java Collections, iterator() doesn't guarantee ordering, unless the underlying Collection guarantees order. For example, List has an ordering, Set (usually) doesn't.
One approach would be to leave things just like they are. Whatever the closure syntax ends up being, it ultimately calls Iterable.iterator(), and ordering is determined by the underlying implementation.
An alternative would be more explicit. One possibility would be to add a method to Iterable. Something like
The closure syntax would have to have something in it to tell it which Iterator to get.
Option 2 is more complex and, IMO, not worth the effort, since it doesn't answer the real question - whatever the order happens to be, can I do this in parallel or must it be serial?
Since the main benefit of closures (as I understand) is the option to do this in parallel, the simplest thing would be to say that if you use the new closure syntax, parallel execution is explicitly allowed.
Therefore, as Strawman #1, I propose these simple rules:
1) If you use the new Closure Syntax TBD, it will call the underyling implementations iterator() to get the order, and may do processing in parallel.
2) If you really care about ordering, or you must process serially, use non-Closure syntax, such as a good old fashioned for loop.
The closure syntax has nothing to do with parallel execution. It's primarily the method to which a closure is passed which is responsible for choosing whether the closure is called in one thread or in parallel.
Regarding parallelism introduced automatically by the compiler, preserving the serial semantics, the issue is the same in for loops and in higher order methods taking closures. Such optimization is hard anyway.
Parallel execution in Java needs to think in terms of Threads. Making code Thread-safe is difficult. Java doesn't need language constructs for using more Threads. If possible, it needs some to simplify Thread-safety.
"A much more serious concern is that indexless loops don’t have to be serial. That is, there’s nothing in a statement like 3.times {doSomething()} that promises any particular order of execution. In fact, just maybe we can do all three actions at the same time."
Nothing except the language specification, that is. Until the spec explicitly allows loop bodies to be run in parallel at the whim of the compiler we don't have a problem. Closures won't add anything to the problem we don't have.