The Artima Developer Community
Sponsored Link

Web Buzz Forum
Authenticating a Googlebot in PHP and Perl

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Douglas Clifton

Posts: 861
Nickname: dwclifton
Registered: May, 2005

Douglas Clifton is a freelance Web programmer and writer
Authenticating a Googlebot in PHP and Perl Posted: Sep 24, 2006 9:23 PM
Reply to this message Reply

This post originated from an RSS feed registered with Web Buzz by Douglas Clifton.
Original Post: Authenticating a Googlebot in PHP and Perl
Feed Title: blogZero
Feed URL: http://loadaveragezero.com/app/s9y/index.php?/feeds/index.rss1
Feed Description: Web Development News, Culture and Opinion
Latest Web Buzz Posts
Latest Web Buzz Posts by Douglas Clifton
Latest Posts From blogZero

Advertisement

code Following a tip from Russ I was pleased to find an interesting post on the Official Google Webmaster Central Blog titled How to verify Googlebot. In a nutshell, it explains how to use the Unix shell program host to authenticate that an IP address copied from your Web server log file really is a Googlebot and not some email harvester (or whatever).

I decided to take this a step further and demonstrate how you can automate this procedure using a scripting language. For these examples I chose PHP and Perl, although you could certainly use Python or Ruby or whatever your preferred language is, as long as it has an interface to the gethostbyname and gethostbyaddr system calls.

Using these calls under PHP is the simpler of the two approaches, as the interface to these routines are written at a more abstract level than using the Perl Socket module. Below is an example googlebot() function in PHP that returns true if the IP address parameter authenticates, although there is no 100% guarantee of preventing a spoof getting through (but it will catch the vast majority of them). A bit of test code is included.

<?php

function googlebot($ip)  {

    // check to see if this IP really is a Googlebot

    $bot = 'googlebot.com';
    $name = gethostbyaddr($ip);
    if ($name == $ip) return false;

    return (strpos($name, $bot) !== false and gethostbyname($name) == $ip) ? true : false;
}

// test it

$ip = '66.249.66.1';

echo $ip . ' is ';
if (!googlebot($ip)) echo 'not ';
echo 'a Google bot' . "\n";
?>

The Perl version is at a much lower level, very similar to the corresponding C system calls. In fact, the module is derived directly from the sys/sockets.h header file and the functions are just wrappers around these Standard C library calls. See Berkeley Sockets for more information. If you have a copy of Programming Perl, the chapter 16 Interprocess Communications section on socket programming will help, and if you are lucky enough to have a copy of the Perl Cookbook, chapter 18 Internet Services has some great recipes for DNS lookups. For really gory details, refer to chapter 14 DNS: The Domain Name System of TCP/IP Illustrated, Volume I—The Protocols.

#!/usr/bin/perl

use Socket;

sub googlebot($)  {

    # check to see if this IP really is a Googlebot

    my $ip = shift;
    my $bot = 'googlebot\.com';
    my $name = gethostbyaddr(inet_aton($ip), AF_INET) or return 0;
    my @addr = gethostbyname($name);
    my $addr = inet_ntoa($addr[4]);

    return ($name =~ m/$bot/ and $ip eq $addr) ? 1 : 0;
}

# test it

$ip = '66.249.66.1';

print $ip . ' is ';
unless (googlebot($ip)) { print 'not '; }
print 'a Google bot' . "\n";

Finally, in case anyone is interested why it's been so long since I posted anything, much of the summer I was sick as a dog and since recovering, busy as a bee. It's nice to be feeling better and back to work!

Read: Authenticating a Googlebot in PHP and Perl

Topic: Build DOM-based Web applications Previous Topic   Next Topic Topic: Fielded Input Considered Harmful

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use