Summary
Fork-join parallelism offers an elegant programming model to distribute a compute-intensive problem across available CPU cores. In a recent IBM developerWorks article, Brian Goetz describes the fork-join framework slated to be included in Java 7.
It is easy to imagine how you can keep a dozen processors busy using a coarse-grained task boundary such as a user request, but this technique will not scale to thousands of processors—traffic may scale exponentially for short periods of time, but eventually the hardware trend wins out.
As we enter the many-core era, we will need to find finer-grained parallelism or risk keeping processors idle even though there is plenty of work to do. As the dominant hardware platform shifts, so too must the software platform if we wish to keep up. To this end, Java 7 will include a framework for representing a certain class of finer-grained parallel algorithms: the fork-join framework.
A popular technique to achieve finer-grained concurrency involves breaking down a larger, compute-intensive task into many sub-tasks, and then merging the results once each subtask completes computation. Goetz shows how the fork-join programming model makes this type of problem decomposition easier.
If the problem is large enough to merit parallel decomposition, [a divide-and-conquer method] divides the problem into two or more sub-problems and recursively invokes itself on the sub-problems in parallel, waits for the results of the sub-problems, and then combines the results...
[A fork-join operation's] behavior is that the current task is suspended, the ... subtasks are executed in parallel, and the current task waits until they complete. Then the results of the ... subtasks can be combined. This kind of parallel decomposition is often called fork-join because executing a task forks (starts) multiple subtasks and then joins (waits for completion) with them.
In the article, Goetz discusses the upcoming fork-join API slated for inclusion in Java 7:
The operation invoke-in-parallel is implemented by the coInvoke() method, which invokes multiple actions simultaneously and waits for them all to complete. A ForkJoinExecutor is like an Executor in that it is designed for running tasks, except that it specifically designed for computationally intensive tasks that do not ever block except to wait for another task being processed by the same ForkJoinExecutor.
What do you think of the fork-join API proposed for Java 7?
Having recently written some similar code, my experience is that the actual forking/joining isn't the biggest issue. Though, I do admit their "work-stealing" concept is cool, and I wasn't creating more than 10s of tasks. IMO, the big issues are
1) event firing / reporting progress / logging / exceptions etc. 2) classic command-liney tasks (and most of the ones I happened to be dealing with) expect files as inputs, not in-memory data. And they write their results to files. Splitting and gathering these files is hard.
I only briefly skimmed Lea's paper, but it didn't seem to deal much with these issues. Not that this is bad, it shows that they tackled the low-hanging fruit first, and probably did a great job. I'm wondering if either I missed something, or there are plans to handle these in later Java libraries.
Techniques like fork-join, work-stealing, and divide-and-conquer predate map-reduce by many years.
Map-reduce addresses a different space, though -- distributed computation. But the techniques used for both distributed decomposition via map-reduce and parallel decomposition for multicore via fork-join do have some elements in common.