This post originated from an RSS feed registered with Java Buzz
by Nick Lothian.
Original Post: Re: RSS Aggregators are the killer app
Feed Title: BadMagicNumber
Feed URL: http://feeds.feedburner.com/Badmagicnumber
Feed Description: Java, Development and Me
I've done some experiments in this area, and Bayesian classification on 4000 items a day would currently be an interesting performance tuning exercise. In my experience it isn't CPU bound, though - it's I/O bound.
Everytime I think about trying to do LSI (or even Vector Space Search) on a couple of million items I start looking at the vector processor units on modern video cards and start drooling. Forget the CPU - off load that processing to the GPU. There still will be problems with disk and memory I/O, but the processing power is there.
(A couple of times I've actually began investigating this. It would be an excellent project to add GPU co-processing to Classifier4J and/or Lucene. JOGL may be the best way to do it.)
GPGPU.org is a decent site for more stuff about this.