The Artima Developer Community
Sponsored Link

Java Buzz Forum
Un-Bayesing SpamAssassin

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Russell Beattie

Posts: 727
Nickname: rbeattie
Registered: Aug, 2003

Russell Beattie is a Mobile Internet Developer
Un-Bayesing SpamAssassin Posted: Mar 12, 2004 1:31 PM
Reply to this message Reply

This post originated from an RSS feed registered with Java Buzz by Russell Beattie.
Original Post: Un-Bayesing SpamAssassin
Feed Title: Russell Beattie Notebook
Feed URL: http://www.russellbeattie.com/notebook/rss.jsp?q=java,code,mobile
Feed Description: My online notebook with thoughts, comments, links and more.
Latest Java Buzz Posts
Latest Java Buzz Posts by Russell Beattie
Latest Posts From Russell Beattie Notebook

Advertisement
It was getting crazy last week. I was getting more and more and more spam. I would go to bed, wake up 8 hours later and have 50+ messages waiting for me. My SpamAssassin was just completely falling down on the job.

At first I lowered the hit level, and that didn't seem to help. Then I went throught he .spamassassin/user_prefs and added points for the following:

  score DISGUISE_PORN 3.0
  score PORN_16 4.0
  score PORN_MEMBERSHIP 4.0
  score PORN_6  4.0
  score PORN_PASSWORD 4.0
  score MUST_BE_18 4.0
  score ADULT_SITE 4.0
  score BEST_PORN 4.0
  score ITS_LEGAL 4.0
  score MICROSOFT_EXECUTABLE 4.0
  score X_OSIRU_DUL 0.0
  score X_OSIRU_DUL_FH 0.0
  score X_OSIRU_OPEN_RELAY 0.0
  score X_OSIRU_SPAMWARE_SITE 0.0
  score X_OSIRU_SPAM_SRC 0.0
  score RCVD_IN_OSIRUSOFT_COM 0
  score HTML_WEB_BUGS 4.0
  score HTML_IMAGE_ONLY_02 4.0
  score HTML_IMAGE_ONLY_04 4.0
  score HTML_IMAGE_ONLY_06 4.0
  score FORGED_YAHOO_RCVD 4.0
  score HTML_MIME_NO_HTML_TAG 4.0
Nothing seemed to helping. I spent all day while I was working with a tail -f of .procmail.log in a window trying to monitor what was happening and comparing the spam headers I got to what was supposed but I couldn't figure it out. *Then* I noticed that many of the headers had a BAYES_0 in it. What that means is that the Bayesian filter had determined there was a 0% chance that the email was spam. Unlike what I first thought, instead of just leaving it at 0, the higher Bayes score actually *subtracted points* from the hit count, thus putting it under my spam limit.

Ahh. So first I modified the scores for the BAYES_xx but started getting false positives, which is bad. I was stumped a bit until my coworker Vineet told me that probably what had happened was the the Bayes filtering had "learned badly". Ahhhh. *That* made sense.

So instead of trying to untrain it, or whatever. I wacked the bayes_seen and bayes_tok files. I'm *sure* there are more elegant ways of doing it, but I figured I'd start from scratch and see if that helped. It definitely helped. I'm still getting *a ton* of spam, but Thunderbird is also helping.

Does anyone know the right score modified for attachments in general? I'm *sooo* fucking sick of that virus or whatever it is with insanely stupid text and a .zip file attachement. "I hate cleartext. Password is 21341254". AAAAhhh.

Anyways, that's my suggestion. I cannot wait until there are some solid solutions for Spam. It's just gotten to a crazy level lately.

-Russ

Read: Un-Bayesing SpamAssassin

Topic: trackback fixed Previous Topic   Next Topic Topic: FireFox: Edit CSS Plugin

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use