This post originated from an RSS feed registered with Java Buzz
by Brian McCallister.
Original Post: Running Alongside Trains
Feed Title: Waste of Time
Feed URL: http://kasparov.skife.org/blog/index.rss
Feed Description: A simple waste of time and weblog experiment
Rusko (don't have a better name for him) got me thinking the other night by doing the dastardly deed of pointing nevow out and discussing why apache (web server) sucked (his words, not mine (for throughput-oriented, highly static, sites)).
So, here is a stab at a design to maximize network throughput. It makes use of some things that don't exist in Java (yet, hopefully) so please excuse the rubyish pseudocode.
We start with a simple looping process:
class Looper
def initialize
@loop = Loop.new
@loop << HttpAcceptor.new(@loop, 80)
end
def start
@finished = false
callcc do |escape|
@loop.cycle(:http) do |bucket|
bucket.continue(callcc { |c| return c }) if bucket.ready?
@loop.remove(bucket) if bucket.finished?
escape.call if @finished
end
end
end
def stop
@finished = true
end
end
A couple pieces of magic there. Loop is the first one. This is a cyclical linked list which supports internal iteration via Loop#cycle where it is passed a cycle key, which identifies which things in the loop we care about (there may be things in the loop we don't know how to handle, this one handles http stuff, another might handle database stuff, but they can share a loop). Importantly, iteration is threadsafe (lets pretend that Ruby supports native threads, why this is important later) and while an element is "held" by an iterator, it is skipped over as if not there by any others.
The second piece of magic is the HttpAcceptor. This just handles accepting tcp connections, and implements the (server side) http protocol. It is a protocol abstraction layer which uses asynchronous IO (NIO for the Java folks). It also implements the Bucket protocol, which we haven't met yet. Note that it is passed the @loop variable. When it receives a new connection it instantiates a Request and adds it to the loop. Where in the loop doesn't actually matter, which is nice.
Now, the Request is interesting:
class Request
def initialize(loop, content_generator)
@loop = loop
@content = content_generator
end
def continue(go_back)
@go_back = go_back
if @pickup
@pickup.call
else
content.start(self)
end
end
def return_when(key)
@loop << Bucket.new(key) { yield }
callcc do |c|
@pickup = c
@go_back.call
end
end
def finished?
content.finished?
end
def ready?
@content.ready?
end
end
The Request is created and given a content_generator which is much like a Java HttpServletRequest. The Request in this case just handles passing control around, and the content_generator does the interesting stuff (ie, is used by people writing apps). We'll see an example of one shortly =)
The Bucket is just a bucket implementation which is finished when the block it is passed in its initializer finishes. It responds to cycles asking for its key.
Now, the content generator may be really simple:
class SimpleContent
def initialize(out)
@out = out
end
def start(request)
@out.puts "Hello World"
end
def finished?
true
end
end
But that is boring, lets look at something that needs to do something where it yields control to a slow process...
class InterestingContent
def initialize(out)
@out = out
end
def start(request)
@out.puts "Bouncer likes "
request.return_when(:database) { @puppy = Puppies.find_by_name 'Bouncer' }
@out.puts(@puppy.favorite_toy)
@finished = true
end
def finished?
@finished # nil == false
end
end
Now, this is much more interesting because it uses the Request#return_when which queues out the (io bound) database call. It specifies it is a :database operation, and does its stuff.
Now, lets follow a single thread of execution from socket connection through to response completion.
The looper asks the http acceptor if it is ready, which it is. It then captures a continuation where it is, and passes that continuation into the acceptor.
The acceptor sees if it has enough information in its io buffer to build a Request, it does! It matches the url to the InterestingContent generator via some means, creates an io buffer, and instantiates generator, uses that to create a Request, drops the request in the loop, and invokes the continuation it was passed.
We're back where the continuation was created, where the next thing is to ask the current bucket if it is finished. The acceptor is never finished, so it says no, and the cycle loops to the next bucket, which, low and behold is the request that was just created! Hey, cool.
The looper asks the request if it is ready (it is) and so captures a continuation, and passes it into the request (with an interesting generator). Now the fun begins.
The request stores the go_back continuation and looks to see if it has an @pickup instance var. It doesn't, so it starts the content generator, passing in itself.
The content generator spews some output into the buffer and realizes it needs to ask for something slow, a database lookup! Oh no! Doing this inline here would block THE WHOLE SERVER on io to the database, so instead it asks the request to schedule a database operation via the return_when(:database) ... call. Now this is interesting...
The request kindly plunks the requested operation into loop, and then captures a continuation right before returning back to the content generator. Instead of returning, though, it invokes the @go_back continuation, which goes back to the looper...
The looper picks up and asks the request if its done, it isn't, and it looks for the next bucket, which is the acceptor, which has nothing interesting so isn't ready, and comes back to the request, which is waiting on the database stuff... Now, smoke and mirrors for a second, pretend the database stuff somehow completed. We'll look at how that happened in a moment, don't worry, accept that it did, please.
The request is read, so we continue it. Now the request sees it has an @pickup so it invokes it.
And we're back in the call to return_to which promptly returns to the interesting content generator, which now has the puppy it was looking for, so it spews some more stuff to the output buffer and says its finished.
The request comes back up, the looper sees things the request is finished, so removes the bucket (okay, I forgot to flush the io buffer to the output socket, I'll leave that pseudo-code as an exercise to the reader ;-)
The looper continues in a busy-wait until another socket comes in. Probably this could be handled via registering an interrupt, but that too is an exercise for the reader and is irrelevent for the main point =)
Now, this is all a single-process operation. There is a co-routine set up between the looper and the request, trading control back and forth via invoking continuations whenever an operation will not be able to execute immediately (needs network io, database io, etc). This maximizes processing time by allowing the cpu to be churning whenever there will be (an expected) io wait. The real impl will be significantly trickier as it needs to do the selecting on io streams, but hey, this is pseudo code for the concepts.
No threads, one process, way higher throughput under heavy load. Sort of scary.
Now, to go back to the loop being thread safe despite there being no threads, and Ruby supporting native threads -- on a multi-cpu machine you start one looper per cpu on the same loop instance =)
Now, what about the database? Well, two things -- you could have a dedicated database process, but this seems rather inefficient unless the database is going all the time. Instead, have a similar setup for the database, and add it to the loop as a Bucket! You can loop through the loop from a bucket in the loop, same type of coroutine as the looper/request, but is now HttpLooper/DatabaseLooper, where the http looper and the database looper are both in the loop, and trade control back and forth.
Now, we can get more fun by not making the loop an actual loop, but a priortity queue instead!