|
Re: Threads Considered Harmful
|
Posted: Jan 29, 2007 4:19 AM
|
|
> James Watson wrote I'm not sure why you would represent > this as being some sort of crazy idea. > I'm not sure why you would represent what I wrote as > though I was saying Frank Silbermann's comments were a > crazy idea. > > James Watson wrote It's clearly one of the commonly > held tenets ... > James, I provided an example of "a multiuser Distributed > DBMS" written in what is generally regarded as a > functional programming language. > > Please provide some example programming language that > corresponds to your notion of a functional programming > language. > > I think it's possible that your notion of functional > programming and Frank Silbermann's notion of functional > programmming don't correspond to any language used by the > functional programming community.
I have to agree with James Watson on this: while FP certainly has its benefits, its oversold. Most non trivial non academic programs do heavy use of the IO monad, so they are not subject to parallelization.
The real advantage of FP languages and not FP is their type systems, which can prevent lots of accidental errors. For example, mainstream imperative programming languages allow the free use of 'null' as a pointer value, whereas FP languages prohibit the use of 'null' and only allow it as 'nil' in an algebraic type, forcing the programmers to write code for handling the case of 'nil'. But algebraic types could be applied to imperative programming languages as well.
The automatic parallelization of pure programs (i.e. programs without assignment) contains a very big obstacle: how to allocate computational units at run-time so as that the parallel computation is optimal? in order to parallelize a pure algorithm, a scheduling algorithm has to know which computational unit is free, has to assign the next available computation to the unit etc. This can not be done easily, and the overhead may be so great that all the program actually does is schedule the computations.
Another problem is when data must be created. In a parallel program, creation of list cells means that a memory block is allocated from the garbage-collected heap. This means the allocation should be protected by a critical section because the memory manager's data are global. Not only that, but the scheduler must actually wait for the allocator to finish before taking the address of the allocated block and pass it to another computation. Take, for example, the standard quicksort algorithm:
sort h:t = (less (sort h t)) : h : (greater (sort h t)) | [] = []
The above algorithm, although it takes two lines to express, can not be parallelized: the expression h : (greater (sort h t)) means that the scheduler must wait for (greater (sort h t)) to finish before passing the result to the cons operator...which in turn must be consed with (less (sort h t)) . During the allocation operation, the garbage collector must be locked.
The quicksort algorithm could be parallelized if the size of the list was known. For example, if we had N elements in the list and M threads, we can use N/M elements per thread. But the size of a linked list is not known in functional programming languages, and the number of threads allocated to the algorithm can only be determined at run-time.
|
|