The Artima Developer Community
Sponsored Link

Web Buzz Forum
Authenticating a Gooblebot in PHP and Perl

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Douglas Clifton

Posts: 861
Nickname: dwclifton
Registered: May, 2005

Douglas Clifton is a freelance Web programmer and writer
Authenticating a Gooblebot in PHP and Perl Posted: Sep 24, 2006 5:22 PM
Reply to this message Reply

This post originated from an RSS feed registered with Web Buzz by Douglas Clifton.
Original Post: Authenticating a Gooblebot in PHP and Perl
Feed Title: blogZero
Feed URL: http://loadaveragezero.com/app/s9y/index.php?/feeds/index.rss1
Feed Description: Web Development News, Culture and Opinion
Latest Web Buzz Posts
Latest Web Buzz Posts by Douglas Clifton
Latest Posts From blogZero

Advertisement

bsd Following a tip from Russ I was pleased to find an interesting post on the Official Google Webmaster Central Blog titled How to verify Googlebot. In a nutshell, it explains how to use the Unix shell program host to authenticate that an IP address copied from your Web server log file really is a Googlebot and not some email harvester (or whatever).

I decided to take this a step further and demonstrate how you can automate this procedure using a scripting language. For these examples I chose PHP and Perl, although you could certainly use Python or Ruby or whatever your preferred language is, as long as it has an interface to the gethostbyname and gethostbyaddr system calls.

Using these calls under PHP is the simpler of the two approaches, as the interface to these routines are written at a more abstract level than using the Perl Socket module. Below is an example googlebot() function in PHP that returns true if the IP address parameter matches, although there is no 100% guarantee of a spoof (it will catch the vast majority of them). A bit of test code is included.

<?php

function googlebot($ip)  {

    // check to see if this IP really is a Googlebot

    $bot = 'googlebot.com';
    $name = gethostbyaddr($ip);
    if ($name == $ip) return false;

    return (strpos($name, $bot) !== false and gethostbyname($name) == $ip) ? true : false;
}

// test it

$ip = '66.249.66.1';

echo $ip . ' is ';
if (!googlebot($ip)) echo 'not ';
echo 'a Google bot' . "\n";
?>

The Perl version is at a lower level, much closer to the corresponding C library calls. In fact, the module is derived directly from the sys/sockets.h header file and the functions are just wrappers around these Standard C library calls. See Berkeley Sockets for more information.

#!/usr/bin/perl

use Socket;

sub googlebot($)  {

    # check to see if this IP really is a Googlebot

    my $ip = shift;
    my $bot = 'googlebot\.com';
    my $name = gethostbyaddr(inet_aton($ip), AF_INET) or return 0;
    my @addr = gethostbyname($name);
    my $addr = inet_ntoa($addr[4]);

    return ($name =~ m/$bot/ and $ip eq $addr) ? 1 : 0;
}

# test it

$ip = '66.249.66.1';

print $ip . ' is ';
unless (googlebot($ip)) { print 'not '; }
print 'a Google bot' . "\n";

Finally, in case anyone is interested why it's been so long since I posted anything, much of the summer I was sick as a dog and since recovering, busy as a bee. It's nice to be feeling better and back to work!

Read: Authenticating a Gooblebot in PHP and Perl

Topic: Dynamically Building a Tree from Database Data Previous Topic   Next Topic Topic: Build a Flexible CSS Web Navigation Architecture

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use