This post originated from an RSS feed registered with Ruby Buzz
by Obie Fernandez.
Original Post: Wrestling with the Bots
Feed Title: Obie On Rails (Has It Been 9 Years Already?)
Feed URL: http://jroller.com/obie/feed/entries/rss
Feed Description: Obie Fernandez talks about life as a technologist, mostly as ramblings about software development and consulting. Nowadays it's pretty much all about Ruby and Ruby on Rails.
Since I just submitted my chapter on Session Management, today I spent some time investigating sessions on our production deployment. The site isn't even launched yet (although it will be soon.) I was surprised to find over 165 thousand session records in the database! The vast majority of them belong to the bots that have been crawling us religiously since we put the site out there in alpha form.
Bots can't hold on to session cookies, so it's not worth letting your Rails application create sessions for them. Disabling sessions for bots also lightens the load on the server, since the session handling code is known to be really slow.
My particular implementation ended up looking like this:
classApplicationController< ActionController::Base# turn off sessions if this is a request from a robot
session :off, :if => proc {|request| Utility.robot?(request.user_agent) }
classUtilitydefself.robot?(user_agent)
user_agent =~ /b(Baidu|Gigabot|Google|libwww-perl|lwp-trivial|msnbot|SiteUptime|Slurp|WordPress|ZIBB|ZyBorg)b/iendendend
We are getting hit by a bot from Google that goes by the name "Mediapartners-Google/2.1", so I put "Google" in that regexp instead of "Googlebot".
Even though we don't rely on the session very much, I did want some sort of test assurance that I wasn't breaking the site horribly. The following RSpec spec is an example what I came up with as a test of the site when sessions should be turned off. It should actually have describe blocks for each controller that has actions that can run no-session.
requireFile.dirname(__FILE__) + '/../spec_helper'GOOGLEBOT_USER_AGENT = "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
describe HomepageControllerdo
integrate_views
it "with sessions turned off should not break for robots"do
request.stub!(:user_agent).and_return(GOOGLEBOT_USER_AGENT)
get :indexendend
After some poking around and manual testing, I felt comfortable deploying the change, but I never really felt confident that my specs are doing what I want. As far as I know, the entire session infrastructure is mocked out for testing, so it wouldn't necessarily work the way I intend.
By the way, if you want to play with RSpec in a similar way, you'll need to monkeypatch Rails' TestRequest class in your spec_helper.rb file. For some reason, it lacks the user_agent method.
classActionController::TestRequestdefuser_agent"Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4"endend