This post originated from an RSS feed registered with Agile Buzz
by Jared Richardson.
Original Post: Clustering Applications for Performance (Mainly Mongrel)
Feed Title: Jared's Weblog
Feed URL: http://www.jaredrichardson.net/blog/index.rss
Feed Description: Jared's weblog.
The web site was created after the launch of the book "Ship It!" and discusses issues from Continuous Integration to web hosting providers.
Now that I've typed this up, it's more than a Rails guide. If you're not familiar with clustering servers in general, these links would be a good starting point. If you understand the way it works in Rails, you'll be pretty close to using other clustering solutions as well.
One of my current projects needs to scale to large volume so I'm starting to look at how Mongrel can be clustered effectively. As is my habit, I'm blogging on my research so that you can all send me email and tell me where I missed the boat. :) And please do. If I get enough good feedback on what works (or what doesn't), I'll compile it into another entry.
First, we're using Mongrel. Mongrel is a Ruby on Rails container, similar to Tomcat or WebSphere in Javaland. Mongrel's fast, light, and easy to configure. It's the best desktop Rails container I've used... if you haven't tried it, go ahead and type gem install mongrel -y and see how easy it is for yourself. A lot of very smart people seem to be using Mongrel and I tend to trust the very smart people to do my research for me when I can. :)
Second, each Rails request locks down the entire Rails instance while it's serving the request. And, yes, I already hear you screaming that Rails doesn't scale... and I reply that this configuration can serve a lot more users than you might realize. And, with the clustering we're about to look at, you can completely side step that limitation. I've done similar clustering with Tomcat and it's the same concept.
It would be nice if someone make a first-class container that did the clustering to green threads though.... on the other hand, making you configure the clustering yourself forces us to make concious decsions about the deployment. Resources aren't infinite and spinning new instances in a servlet engine until the server crashes might not be the best model to follow anyway...
You've also got to make sure your session information is persisted somewhere other than just the memory of your app server. Fortunately for us, Rails makes that easy. See here.
Anyway, I've been reading on this page (Mongrel Deployment Options). It's got a good discussion of the entire topic with lots of good links. It will point you to Mongrel Cluster, a very cool application that does exactly what the name implies. It clusters Mongrels for you. Creates a pack of Mongrels...everyone groan for the bad pun now. :)
Once you have a pack of Mongrels eagerly waiting for clients, you need something in front of Mongrel to hand off your incoming client requests to each instance. Here's a a great diagram of what it looks like. The parts below wold replace lighttpd in the diagram. You can use:
Hardware load balancer (like an F5 product)- not cheap and not easy but fast and reliable
Apache and mod_proxy - complicated to set up, but free and rock solid
Balance-this is a native load balancing product that's got a commercial verison as well.
Pound-a step up from Pen without going all the way to Apache.
Pen-very simple, but limited to 15 backend servers. Since the Mongrel guide suggests 12 Mongrel instances per CPU, Pen can't handle my dual core laptop if I try to run 24 Mongrels.... seems a little lightweight.
The Apache solution is probably the fastest solution for most applications, but I wouldn't go there until you needed it. The extra complexity it would add into your development cycle isn't worth the cost. However, in production, you can offload your CSS, graphics, and any static content to Apache and let Rails just do Rails. I'm not sure how much this will practically buy you, but when you're tweaking for real-world performance, I'd be sure I did something simple that's going to affect every single client request.
37 Signals (and DHH) use Balance, so that's a vote in the right direction.
I was put off by the requirement to compile the various products. We have developers on Windows as well as OS X so compilation introduces a new wrinkle.... however, it seems that every single product will require a compile to work. Linux might have a few binaries available, but OS X.
Balance 3.3.4 failed to compile for me.
Here are some other links...
This page has a great step-by-step set of directions on setting up Apache with mod_proxy_balancer in front of mongrel_cluster.