those that rely on lightweight threads, namely Haskell GHC, Erlang HiPE and Mozart/Oz. I don't know whether Smalltalk VisualWorks and Scala belong here too; their performance is far from the top three entries.
those that use POSIX threads, with or without actual parallelism, including most other entries
Ruby, a category of it own (?): very expensive user-mode mode threads with no parallelism
The OCaml entry is amongst the fastest pthread-based ones, but still markedly
slower than the top entry, by around an order of magnitude. The version I wrote some time ago, based on the Lwt cooperative lightweight thread library, is close in performance to GHC.
Some analysis reveals interesting facts about GHC's concurrency support and Lwt.
Performance
Here are the figures I get on a dual-core AMD Athlon 64 X2 6000+ in 32 bit mode
(Why 32 and not 64? Because it's faster in this benchmark; 64 bit pointers are
heavy and we get nothing in return in this case.)
implementation
memory usage
time (s)
Haskell GHC 6.8.2
2680KB
1.22
Haskell GHC 6.8.2 -threaded, -N2
3300KB
15.27
Haskell GHC 6.8.2 -threaded, -N1
2760KB
1.9
Erlang HiPE
5996KB
3.96
OCaml ocamlopt 1024Kword minor heap
5178KB
1.85
OCaml ocamlopt 256Kword minor heap
2016KB
2.05
OCaml ocamlopt 64Kword minor heap
1228KB
3.06
OCaml ocamlopt 32Kword minor heap
970KB
4.24
The Haskell code was compiled with -O2 (GHC 8.8.2); for Erlang, I used erlc +native +"{hipe, [o3]}" (Erlang R12B-1).
GC overhead
The OCaml version is clearly GC bound, and performance increases as the minor heap is
enlarged, decreasing the amount of GC work. Whereas with the default 256KB heap the Erlang
program is slightly faster, when OCaml is allowed to use comparable amounts of memory, it is over
twice as fast.