The Artima Developer Community
Sponsored Link

Ruby Buzz Forum
Wrestling with the Bots

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Obie Fernandez

Posts: 608
Nickname: obie
Registered: Aug, 2005

Obie Fernandez is a Technologist for ThoughtWorks
Wrestling with the Bots Posted: Jun 22, 2007 3:18 PM
Reply to this message Reply

This post originated from an RSS feed registered with Ruby Buzz by Obie Fernandez.
Original Post: Wrestling with the Bots
Feed Title: Obie On Rails (Has It Been 9 Years Already?)
Feed URL: http://jroller.com/obie/feed/entries/rss
Feed Description: Obie Fernandez talks about life as a technologist, mostly as ramblings about software development and consulting. Nowadays it's pretty much all about Ruby and Ruby on Rails.
Latest Ruby Buzz Posts
Latest Ruby Buzz Posts by Obie Fernandez
Latest Posts From Obie On Rails (Has It Been 9 Years Already?)

Advertisement

Since I just submitted my chapter on Session Management, today I spent some time investigating sessions on our production deployment. The site isn't even launched yet (although it will be soon.) I was surprised to find over 165 thousand session records in the database! The vast majority of them belong to the bots that have been crawling us religiously since we put the site out there in alpha form.

Bots can't hold on to session cookies, so it's not worth letting your Rails application create sessions for them. Disabling sessions for bots also lightens the load on the server, since the session handling code is known to be really slow.

My particular implementation ended up looking like this:

class ApplicationController < ActionController::Base

  # turn off sessions if this is a request from a robot
  session :off, :if => proc { |request| Utility.robot?(request.user_agent) }

  class Utility
    def self.robot?(user_agent)
      user_agent =~ /b(Baidu|Gigabot|Google|libwww-perl|lwp-trivial|msnbot|SiteUptime|Slurp|WordPress|ZIBB|ZyBorg)b/i
    end
  end

end

We are getting hit by a bot from Google that goes by the name "Mediapartners-Google/2.1", so I put "Google" in that regexp instead of "Googlebot".

Even though we don't rely on the session very much, I did want some sort of test assurance that I wasn't breaking the site horribly. The following RSpec spec is an example what I came up with as a test of the site when sessions should be turned off. It should actually have describe blocks for each controller that has actions that can run no-session.

require File.dirname(__FILE__) + '/../spec_helper'

GOOGLEBOT_USER_AGENT = "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

describe HomepageController do
  integrate_views
  
  it "with sessions turned off should not break for robots" do
    request.stub!(:user_agent).and_return(GOOGLEBOT_USER_AGENT)
    get :index
  end
  
end

After some poking around and manual testing, I felt comfortable deploying the change, but I never really felt confident that my specs are doing what I want. As far as I know, the entire session infrastructure is mocked out for testing, so it wouldn't necessarily work the way I intend.

By the way, if you want to play with RSpec in a similar way, you'll need to monkeypatch Rails' TestRequest class in your spec_helper.rb file. For some reason, it lacks the user_agent method.

class ActionController::TestRequest
  def user_agent
    "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4"
  end
end

Read: Wrestling with the Bots

Topic: The Most Annoying Ad Ever Previous Topic   Next Topic Topic: Put Your Controllers on a Diet already!

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use