This post originated from an RSS feed registered with Web Buzz
by Douglas Clifton.
Original Post: Authenticating a Googlebot in PHP and Perl
Feed Title: blogZero
Feed URL: http://loadaveragezero.com/app/s9y/index.php?/feeds/index.rss1
Feed Description: Web Development News, Culture and Opinion
Following a tip from Russ I was pleased to find an interesting post on the Official Google Webmaster Central Blog titled
How to verify Googlebot. In a nutshell, it explains how to use the Unix shell program host to authenticate that an IP address copied from your Web server log file really is a Googlebot and not some email harvester (or whatever).
I decided to take this a step further and demonstrate how you can automate this procedure using a scripting language. For these examples I chose PHP
and Perl, although you could certainly use Python or Ruby or whatever your preferred language is, as long as it has an interface to the gethostbyname and gethostbyaddr system calls.
Using these calls under PHP is the simpler of the two approaches, as the interface to these routines are written at a more abstract level than using the Perl Socket module. Below is an example googlebot() function in PHP that returns true if the IP address parameter authenticates, although there is no 100% guarantee of preventing a spoof getting through (but it will catch the vast majority of them). A bit of test code is included.
<?php
function googlebot($ip) {
// check to see if this IP really is a Googlebot
$bot = 'googlebot.com';
$name = gethostbyaddr($ip);
if ($name == $ip) return false;
return (strpos($name, $bot) !== false and gethostbyname($name) == $ip) ? true : false;
}
// test it
$ip = '66.249.66.1';
echo $ip . ' is ';
if (!googlebot($ip)) echo 'not ';
echo 'a Google bot' . "\n";
?>
The Perl version is at a much lower level, very similar to the corresponding C system calls. In fact, the module is derived directly from the sys/sockets.h header file and the functions are just wrappers around these Standard C library calls. See Berkeley Sockets for more information. If you have a copy of Programming Perl, the chapter 16 Interprocess Communications section on socket programming will help, and if you are lucky enough to have a copy of the Perl Cookbook, chapter 18 Internet Services has some great recipes for DNS lookups. For really gory details, refer to chapter 14 DNS: The Domain Name System of TCP/IP Illustrated, Volume I—The Protocols.
#!/usr/bin/perl
use Socket;
sub googlebot($) {
# check to see if this IP really is a Googlebot
my $ip = shift;
my $bot = 'googlebot\.com';
my $name = gethostbyaddr(inet_aton($ip), AF_INET) or return 0;
my @addr = gethostbyname($name);
my $addr = inet_ntoa($addr[4]);
return ($name =~ m/$bot/ and $ip eq $addr) ? 1 : 0;
}
# test it
$ip = '66.249.66.1';
print $ip . ' is ';
unless (googlebot($ip)) { print 'not '; }
print 'a Google bot' . "\n";
Finally, in case anyone is interested why it's been so long since I posted anything, much of the summer I was sick as a dog and since recovering, busy as a bee. It's nice to be feeling better and back to work!