Summary
Multicore CPUs are available on both server and desktop-class systems. Yet, relatively little attention has been devoted to the benefits of concurrency in desktop applications. In a recent JavaWorld article, Kirill Grouchnikov explores concurrent programming for the desktop.
The consistent increase in single-core CPU speed that programmers could rely on for years is no longer available. This has been true for the server side for at least a decade, and now it's also the reality for client-side programming. Quite a few tasks easily lend themselves to being parallelized using new features in JDK 6.0, letting client-side applications take advantage of newer multicore hardware...
Multicore machines are the reality for client-side development. Failure to adapt will result in software that does not scale well on modern hardware. By identifying routines that lend themselves to parallelizing, your code can continue enjoying the advances of multicore machines.
However, this means switching your mindset from the sequential model to the concurrent model. The concurrent model doesn't only yield performance improvements; it also comes with its own set of best practices. If you want to achieve competitive performance and write software that scales with the hardware, you need to dip your toes in the concurrency pool.
To illustrate his point of scaling client-side performance with parallelism, Grouchnikov demonstrate how to take advantage of the concurrency utilities in JDK 6 to speed up array sorting by more than 30% on multicore hardware.
Existing Java collection-sorting routines ... perform no faster on newer multicore machines than on single-core machines. This might seem acceptable on smaller inputs, but the input domains for most real-world problems don't stand still. Moreover, developers and users rightly expect their programs to run faster on newer hardware.
When you call Arrays.sort or Collections.sort, the running time doesn't grow linearly with the collection size. On my machine, calling Arrays.sort on an array of 200,000 strings takes 490ms on average. An array of 400,000 strings is sorted in 1290ms—an increase in running time by factor of about 2.6...
The mergesort in the Arrays class [is a ] classic mergesort with a few optimization tweaks for corner cases. If the array size is small (seven or fewer with the current implementation), it reverts to the bubble sort. Otherwise, the array is split in half, and the same method is called (recursively) on both halves. After both halves are sorted, the code "merges" them (hence the name).
Grouchnikov points out that mergesort lends itself to parallel execution, and that modifying the mergesort in the current Arrays implementation is straightforward with the JDK 6 concurrency utilities:
You can easily see that increasing the CPU speed (number of operations per second) by a factor of two results in a matching improvement in the algorithm running time. This is the case because the mergesort is a sequential recursive algorithm that does exactly the same sequence of steps, provided the same input.... It doesn't explore the inherent concurrency of the recursive implementation. When the array is split in half, the sorting of the second half begins only after the sorting of the first half is done.
New concurrency utilities in JDK 6.0 come to the rescue, letting you perform these subtasks in parallel without any communication between them. At the end, when both tasks are done, the sorted halves still need to be merged together.
Grouchnikov presents his implementation in the article, and shows that the simple changes affecting parallel sorting results in over 30% increase in performance. He also notes, however, that the concurrent code ads a layer of complexity to sorting, and that the decision of just how much concurrency to use requires careful consideration.
One nit to pick with: "New concurrency utilities in JDK 6.0 come to the rescue, letting you perform these subtasks in parallel without any communication between them."
This isn't something that the new concurrent utilities allow, they just make things easier. Performing subtasks in parallel with no communication is the easiest kind of concurrency. It's the communication that is tricky but still doable without the concurrent package.
And, on a side note, I'm not convinced that the concurrent package is the slam-dunk that it is cracked-up to be. I've had the experience of executing a Future in a blocking execution that I can see has completed its work but never returns. Surely resolvable but clearly not pain-free.
I'm kind of leaning towards the belief that a more high-level abstraction like Actors is required to bring concurrency to the masses.
> I'm kind of leaning towards the belief that a more > high-level abstraction like Actors is required to bring > concurrency to the masses.
[speaking as a gadfly, not as someone with real experience in concurency] i think that while abstractions which get away from the lock twiddling are probably good, they don't seem to give us ways to e.g. prove anything about the code to avoid classic concurrency problems (e.g. deadlock). i think CSP did/has tools to do that kind of analysis, which is the kind of thing i'd really like to be able to use in the real world.
> > I'm kind of leaning towards the belief that a more > > high-level abstraction like Actors is required to bring > > concurrency to the masses. > > [speaking as a gadfly, not as someone with real experience > in concurency] i think that while abstractions which get > away from the lock twiddling are probably good, they don't > seem to give us ways to e.g. prove anything about the code > to avoid classic concurrency problems (e.g. deadlock). i > think CSP did/has tools to do that kind of analysis, which > is the kind of thing i'd really like to be able to use in > the real world.
That's an interesting thought. It would be nice if a given concurrency abstraction would limit the ways that threads can interact to prevent such things but maybe that's not feasible or would be too limiting for general use.
This beast is not dedicated to be only some sort of toy or experimental program-language for highspeed-computers. It looks like IBM is creating the successor of Java (TM). Who will still care about Java 7 if such a beast exists in one or two years?
> <p>Multicore machines are the reality for client-side > development. Failure to adapt will result in software that > does not scale well on modern hardware. By identifying > routines that lend themselves to parallelizing, your code > can continue enjoying the advances of multicore > machines.</p>
As I've stated before, you could argue this in the other direction. Failure to adapt may result in a market failure for multicore machines.
Historically, the ability for legacy software to run faster on new hardware was a benefit to chip vendors and system vendors, not software vendors.
> That's an interesting thought. It would be nice if a > given concurrency abstraction would limit the ways that > threads can interact to prevent such things but maybe > that's not feasible or would be too limiting for general > use.
I do not think that true parallelism will arrive to the desktop in the near future. Check http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf for more on that. It basically says - and I agree with it - that the currently widely adopted programming paradigms, languages and knowledge of the average programmer can not be applied to concurrent programming. Most of us are capable of writing a code based on some well-known problem like a simple consumer-producer; however, a real-world application needs more than parallelizing array sorting. The inherent concurrencies should be revealed algorithmically, not by intuition. I would bet on data flow based languages, but the momentum of the developer world is very high now and it direction cannot be changed easily.
> I do not think that true parallelism will arrive to the > desktop in the near future. Check > http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1 > .pdf for more on that. > It basically says - and I agree with it - that the > currently widely adopted programming paradigms, languages > and knowledge of the average programmer can not be applied > to concurrent programming. Most of us are capable of > writing a code based on some well-known problem like a > simple consumer-producer; however, a real-world > application needs more than parallelizing array sorting. > The inherent concurrencies should be revealed > algorithmically, not by intuition. I would bet on data > flow based languages, but the momentum of the developer > world is very high now and it direction cannot be changed > easily.
That was a great paper. It debunks the notion that threading "isn't that hard if you know what you're doing". As described in the paper, it may be years before you trip over the bugs you've created.
It also makes one of my favorite points: bugs in legacy code that are revealed by running them on multicore machines will probably be blamed on the new computer, not the software vendor (and to the extent that chip vendors are not delivering full backward compatibility, the blame is somewhat justified).