Python Buzz Forum - The Concurrency Model Debate

There's some discussion in the Rails world going on about FastCGI and scaling: pro FastCGI and anti FastCGI. Python's position here is kind of vague, because it supports both models reasonably well (though the Horrible Global Interpreter Lock makes the single process model slightly less scalable than it might otherwise be).

"FastCGI" is a misnomer here, and both sides seem rather misinformed. This is a discussion about the differences between threaded concurrency, with a pool of worker threads, and multi-process concurrency, with a pool of worker processes. FastCGI can do either, though it has extra (confusing) features for worker processes. AFAIK the only reason that most Java systems don't use FastCGI is because it's a pain in the ass to set it up, as the protocol is overdesigned and confuses the simplest aspect (sending a request to another process) with all sort of other features (sharing an error log, intercepting the request without responding to it, starting, restarting, and pooling worker processes, etc). Because of FastCGI's flaws, people constantly create and recreate the basic functionality -- SCGI, PCGI, mod_webkit, mod_skunk, AJP, and no doubt a whole slew of other similar things I don't know about.

OK, so ignore the FastCGI stuff. The issue is about concurrency. Apparently some Java-heads are having a hard time believing that a worker process model can be scalable. This is frankly bizarre, and I fear a sign of myopia in that community. Or just a sign of a few crackpots who are good typists -- probably best not to condemn the community for one guy.

The worker model is nothing new -- it's what Apache 1.3 does exclusively, and one of the concurrency options for Apache 2. The concurrency model for mod_ruby is the same as their FastCGI option, and mod_python if you happen to be using the worker model under Apache 2. In that model, you have multiple processes, and there's a queue of incoming requests -- a free process picks a request off the queue, processes only that request, and when finished grabs another request. In a threaded model almost the exact same thing happens, except there's worker threads. Not Very Different.

This multiprocess model has some advantages and disadvantages, none of which are nearly as extreme as anyone seems to think -- you still have to handle concurrency and contention for shared resources. Some in-process resources that are not threadsafe, like database connections, are automatically isolated. But other locking can be slightly harder, and generally shared resources are more difficult. One advantage is that you are encouraged from the beginning to share data in a way that scales across multiple machines. It's not really a choice for Ruby systems, because Ruby doesn't support OS-level threads, which I believe means that blocking I/O will block all threads in the process (among other issues).

In Python people go both ways. Threading seems to be more common, probably because it's more obvious and is easy to scale until you hit the one-process/one-machine barrier. Zope, Webware, and CherryPy are all threaded. SkunkWeb, CGI, and mod_python all use multiple processes (well, mod_python can use a single process with multiple interpreters, but I believe requests always are processed in separate interpreters). Quixote and now WSGIKit are fairly neutral on the matter. I'm not sure where other frameworks lie. And of course there is asynchronous (event driven) programming, which is non-threaded single-process. This is popular on the network level -- Twisted, Medusa, asyncore -- but relatively uncommon in higher level systems, with Nevow being the exception.


	Web Artima.com