Last week, I used the
Lwt cooperative lightweight thread library to
implement a benchmark that measures context switch performance, determined
that it was GC bound and timed it against comparable programs (i.e., the
fastest implementations in the computer language benchmark games, which are all based on lightweight threads) and a C version that
uses POSIX threads, obtaining these results:
implementation
memory usage
time (s)
Haskell GHC 6.8.2
2680KB
1.22
OCaml ocamlopt 1024Kword minor heap
5178KB
1.85
Haskell GHC 6.8.2 -threaded, -N1
2760KB
1.9
Erlang R12B-1 HiPE
5996KB
3.96
Mozart/Oz
3788KB
4.10
OCaml ocamlopt 32Kword minor heap
970KB
4.24
Haskell GHC 6.8.2 -threaded, -N2
3300KB
15.27
GCC C (POSIX threads)
4520KB
28.7
Here are the figures I get for the C version I made with
Protothreads:
GCC C (Protothreads, optimum scheduling)
220KB
0.076s
GCC C (Protothreads, pessimum scheduling)
220KB
18.6s
(Read below for an explanation of the difference between optimum and pessimum
scheduling policies.)
It is nearly 400 times faster than the C version with
POSIX threads, and represents a one order of magnitude improvement over the
other lightweight thread implementations. It also needs less memory. The
performance is almost unbelievable. Where's the catch, ugly code maybe? Judge
for yourself; here are the Protothreads and the POSIX threads C
implementations head to
head*1:
/* Protothreads, 0.076s*/ /* POSIX threads, 28.7s */
#include <stdio.h> #include <stdio.h>
#include <stdlib.h> #include <stdlib.h>
#include "pt.h" #include <pthread.h>
#include "pt-sem.h" #include <limits.h>
#define NUM_THREADS 503 #define NUM_THREADS 503
static struct pt thread_context[NUM_THREADS]; struct stack {
static int data[NUM_THREADS]; char x[PTHREAD_STACK_MIN];
static struct pt_sem mutex[NUM_THREADS]; };
static static pthread_mutex_t mutex[THREADS];
PT_THREAD(thread(struct pt *pt, int id)) static int data[THREADS];
{ static struct stack stacks[THREADS];
static int token;
static int r; static void* thread(void *num)
PT_BEGIN(pt); {
int l = (int)num;
while(1) { int r = (l+1) % THREADS;
PT_SEM_WAIT(pt, &mutex[id]); int token;
token = data[id];
r = (id + 1) % NUM_THREADS; while(1) {
if(token) { pthread_mutex_lock(mutex + l);
data[r] = token - 1; token = data[l];
PT_SEM_SIGNAL(pt, &mutex[r]); if (token) {
} else { data[r] = token - 1;
printf("%d\n", id + 1); pthread_mutex_unlock(mutex + r);
exit(0); }
} else {
} printf("%i\n", l+1);
PT_END(pt); exit(0);
} }
}
int }
main(int argc, char *argv[])
{ int main(int argc, char **argv)
int i; {
int i;
if(argc != 2) exit(255); pthread_t cthread;
data[0] = atoi(argv[1]); pthread_attr_t stack_attr;
for(i = 0; i < NUM_THREADS; i++) { if (argc != 2) exit(255);
PT_SEM_INIT(&mutex[i], 0); data[0] = atoi(argv[1]);
PT_INIT(&thread_context[i]);
} pthread_attr_init(&stack_attr);
PT_SEM_INIT(&mutex[0], 1); for (i = 0; i < THREADS; i++) {
pthread_mutex_init(mutex + i, NULL);
while(1) { pthread_mutex_lock(mutex + i);
for(i = 0; i < NUM_THREADS; i++)
thread(&thread_context[i], i); pthread_attr_setstack(&stack_attr, &stacks[i],
} sizeof(struct stack));
} pthread_create(&cthread, &stack_attr, thread,
(void*)i);
}
pthread_mutex_unlock(mutex + 0);
pthread_join(cthread, NULL);
}
Even though the Protothreads code is similar to the one with pthreads, there are
two important differences:
there is no thread-local data in the Protothreads version. r is recomputed on each iteration
the proto threads are scheduled manually by running all the corresponding functions in sequence
These dissimilarities give some insight into the nature of Protothreads. They
are little more than small state machines whose state is stored in an
opaque pt structure. Protothreads' implementation consists of a few
preprocessor macros in headers, so the best way to see how they work is to
examine the output of the preprocessor (gcc -E, slightly abridged and
reformatted here):