The Artima Developer Community
Sponsored Link

Ruby Buzz Forum
Related document discovery, without algebra

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Eigen Class

Posts: 358
Nickname: eigenclass
Registered: Oct, 2005

Eigenclass is a hardcore Ruby blog.
Related document discovery, without algebra Posted: Jan 20, 2007 6:00 AM
Reply to this message Reply

This post originated from an RSS feed registered with Ruby Buzz by Eigen Class.
Original Post: Related document discovery, without algebra
Feed Title: Eigenclass
Feed URL: http://feeds.feedburner.com/eigenclass
Feed Description: Ruby stuff --- trying to stay away from triviality.
Latest Ruby Buzz Posts
Latest Ruby Buzz Posts by Eigen Class
Latest Posts From Eigenclass

Advertisement

You have heard about latent semantic analysis (if you haven't, take a look at this nice article on a SVD recommendation system written in 50 lines of Ruby). And you told yourself "hey, this is cool", to file it in your head right away. Or maybe you tried to actually use it, but were scared off by the algebra 101 part, or got lazy when you realized you needed to compile LAPACK, GSL or some other numerical library*1.

But you can get pretty far even without dimensionality reduction. If the feature space (e.g. the terms/concepts associated to your documents) is small enough, and you make sure synonymy is not a problem, you can do without algebra. One such case is that of your blog postings and their tags.

LSI is about reducing the dimensionality of a sparse term-document matrix, mitigating synonimy (different terms referring to the same idea) and polysemy (a word having multiple meanings). A program would do it using singular value decomposition, but you're also performing dimensionality reduction each time you tag your articles, mapping the text to a small number of keywords.

This means that you can use your tags to compute the cosine similarity between your posts, and find related pages in a dozen lines of code. The code that creates the "see also" box in eigenclass.org's pages looks essentially like this:


Read more...

Read: Related document discovery, without algebra

Topic: Rails 1.2 gems are flying Previous Topic   Next Topic Topic: Kchhhk (Small Implosion)

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use