The Artima Developer Community
Sponsored Link

Java Buzz Forum
Crawlers Detection in Java

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Erik C. Thauvin

Posts: 4232
Nickname: ethauvin
Registered: Apr, 2004

Erik C. Thauvin maintains one of the web's first and most popular linkblogs.
Crawlers Detection in Java Posted: Feb 22, 2006 3:35 AM
Reply to this message Reply

This post originated from an RSS feed registered with Java Buzz by Erik C. Thauvin.
Original Post: Crawlers Detection in Java
Feed Title: Erik's Weblog
Feed URL: http://erik.thauvin.net/blog/feed.jsp?cat=Java
Feed Description: The Truth is Out There!
Latest Java Buzz Posts
Latest Java Buzz Posts by Erik C. Thauvin
Latest Posts From Erik's Weblog

Advertisement
[@480]

Crawlers Detection in Java

As I was testing my link redirector Servlet for the linkblog, Rick asked what I was doing about search engine crawlers. I told him I was inspecting the user-agent on all requests and excluding anything with the words bot, crawler or spider, which I knew was not hardly enough.

I was ready to live with it, when I suddenly remembered that AWStats, my favorite logfile analyzer, does a pretty good job at keeping track of robots/spiders. It actually includes a Perl module with around 400 regexp user-agent matches for all sort of known robots, spiders and crawlers.

I converted the AWStats lookup data into a Java class, Robots, which I used in my Servlet.

Thanks to Laurent Destailleur, the author of AWStats, for allowing me to release it in the public domain.

Tags: , , , ,
Bookmarks: del.icio.us, yahoo!, digg it, reddit

Read: Crawlers Detection in Java

Topic: Yahoo! goes for Ajax Previous Topic   Next Topic Topic: The Rumor Is True. Oracle is buying.

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use