The Artima Developer Community
Sponsored Link

Python Buzz Forum
Bayesian Dark Side

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Ian Bicking

Posts: 900
Nickname: ianb
Registered: Apr, 2003

Ian Bicking is a freelance programmer
Bayesian Dark Side Posted: Nov 25, 2003 10:24 PM
Reply to this message Reply

This post originated from an RSS feed registered with Python Buzz by Ian Bicking.
Original Post: Bayesian Dark Side
Feed Title: Ian Bicking
Feed URL: http://www.ianbicking.org/feeds/atom.xml
Feed Description: Thoughts on Python and Programming.
Latest Python Buzz Posts
Latest Python Buzz Posts by Ian Bicking
Latest Posts From Ian Bicking

Advertisement

Bayesian filters are neat. You can do more than just filter spam with them, of course, they are a general trainable filtering tool, simple but effective. They are old-school AI (i.e., the AI we actually use) -- they don't seem intelligent, but if they get the job done...

But as we all have learned from the movies, AI has its dark side. It occurred to me that Bayesian filters can have their dark side as well -- along with whatever other techniques we create to deal with spam. What if we don't control the filtering for ourselves, what if someone controls the filtering for us, be it the Great Firewall of China or some other censorware? Censorware has always been bad in spirit, but the consolation is that it seems to be pretty bad in terms of effectiveness too. Blacklists and keyword blocking don't work well, especially if people are actively working to foil you with proxies and red herrings. Which are all issues that spam filters have had to deal with. As the spam filters become better at this, it's only a matter of time before the political filters catch on too.

I would imagine Bayesian censorware to work by having trusted people classify randomly sampled pages (as collected from actual visiting patterns). But unlike a blacklist, a well-trained classifier can use its knowledge to extrapolate on the classification of new pages. Unlike a keyword classifier, it is harder to foil with slang. (Though a Javascript compressor would probably confuse the heck out of it)

I find this troubling, because I know there are people out there actively working on just this sort of thing, and I'd hate to see them succeed. (Thankfully spammers are also hard at work to get around our attempts at classifications -- spammers for free speech!)

Read: Bayesian Dark Side

Topic: OverlayPanel Previous Topic   Next Topic Topic: Longhorn up and running

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use