The Artima Developer Community
Sponsored Link

Java Buzz Forum
Spam Blog Crisis

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Nick Lothian

Posts: 397
Nickname: nicklothia
Registered: Jun, 2003

Nick Lothian is Java Developer & Team Leader
Spam Blog Crisis Posted: Oct 16, 2005 5:41 PM
Reply to this message Reply

This post originated from an RSS feed registered with Java Buzz by Nick Lothian.
Original Post: Spam Blog Crisis
Feed Title: BadMagicNumber
Feed URL: http://feeds.feedburner.com/Badmagicnumber
Feed Description: Java, Development and Me
Latest Java Buzz Posts
Latest Java Buzz Posts by Nick Lothian
Latest Posts From BadMagicNumber

Advertisement

Tim Bray says there is a spam blog emergency occuring right now. I tend to agree. I'd like to see the search terms he is using to get that many splogs, though.

Removing spam blogs results from results sorted based on time is difficult because you can't rely on PageRank-like algorithms. Email spam filters are probably a better model, although the auto-generated splogs that I suspect Tim is suffering from are hard to detect using Bayesian-type algorithms. OTOH, my de-spammed version of Google's blog search just uses heuristics based on the URL of the item, and it does okay for many searches. Compare my version of a search for "cancer" with the raw version. At the time of writing my version removes 26 spammy results to get the first 10 non-spammy ones.

Read: Spam Blog Crisis

Topic: Google Delicious? Previous Topic   Next Topic Topic: New XPert Session on Predictive Self-Healing

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use