The Artima Developer Community
Sponsored Link

Ruby Buzz Forum
Trust Metrics Mixin #1: PageRank

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Red Handed

Posts: 1158
Nickname: redhanded
Registered: Dec, 2004

Red Handed is a Ruby-focused group blog.
Trust Metrics Mixin #1: PageRank Posted: Jan 11, 2006 2:12 PM
Reply to this message Reply

This post originated from an RSS feed registered with Ruby Buzz by Red Handed.
Original Post: Trust Metrics Mixin #1: PageRank
Feed Title: RedHanded
Feed URL: http://redhanded.hobix.com/index.xml
Feed Description: sneaking Ruby through the system
Latest Ruby Buzz Posts
Latest Ruby Buzz Posts by Red Handed
Latest Posts From RedHanded

Advertisement

We get on these little kicks. As we’re all snooping around. Little protocols or obscure version control systems. Or domino games or something.

Trust metric, man. It just strikes me all the time: I imagine everyone in the world with their hands cupped, a little ball of clay in the middle. Everyone gets a moment to divide up their ball into pieces for selected friends and relatives. Trick is they have to divide up the whole ball. They can’t hang on to any. Clay is redistributed. One guy has a big monolith! People, don’t give him any more! (Unless, of course, he really can be trusted with clay.)

Here’s today’s challenge. You are given objects which respond to:

  • trust(*objs): Assigns one unit of trust in an arc from self to each of the objs.
  • trusts?(obj): Tests self for any trust of obj.
  • trust_spread: The total number of trusted objects.
  • trust_sphere: All objects ranked in the system.

Given these objects, provide a mixin which calculates trust. In today’s case, we’ll be mixing in a pagerank method, which approximates rank based on a primitive form of Google’s algorithm.

 module GoogleTrust
   DAMPEN = 0.85
   def pagerank(ranks = {})
     return ranks[self] if ranks.has_key? self
     ranks[self] = @rank || 0.0
     @rank =
       (1 - DAMPEN) +
       (DAMPEN * trust_sphere.inject(0) do |n, (o, c)|
         n + (o.trusts?(self) ? o.pagerank(ranks) / o.trust_spread: 0)
       end)
   end
 end

This algorithm is an approximation of Google’s PageRank, as described in PageRank Explained Correctly with Examples by some third party. Regardless, the basic equation can be seen throughout the web and I will refer you to the paper above in sorting out exactly how PageRank actually works.

You probably get the jist, though: by linking to sites, you are extending a unit of trust to them. But also: the value of your link is also based on the links you are receiving. So, it’s sort of like a tent, ranks evolve from a gelatinous blob into a more rigid and beautifully useful form.

Let’s get some sites running our PageRank. I’ll use some blogs from the Ruby world:

 class Site
   include GoogleTrust

   @@sites = []
   attr_accessor :name

   def initialize(name)
     @name = name
     @arcs = {}
     @@sites.push self
   end

   def trust *sites
     sites.each { |site| @arcs[site] = (@arcs[site] || 0) + 1 }
   end

   def trusts? site;     @arcs.has_key? site end
   def trust_worth site; @arcs[site]         end
   def trust_spread;     @arcs.length        end
   def trust_sphere;     @@sites             end
 end

 red = Site.new('RedHanded')
 eig = Site.new('Eigenclass.org')
 rails = Site.new('Ruby on Rails')
 ana = Site.new('Anarchaia')
 prj = Site.new('Project.ioni.st')

So, sites are represented by a name and an arc of other objects, representing sites which have been linked to. Here are five blogs, let’s make a web.

 red.trust   rails, eig, ana
 eig.trust   ana
 rails.trust prj, eig
 ana.trust   prj, red, rails, eig
 prj.trust   red, rails, eig, ana

 40.times do |i|
   puts "--- PASS #{i} ---" 
   red.trusted_sphere.each do |site|
     puts "#{ site.name }: #{ site.pagerank }" 
   end
 end

The pagerank is recursive, it computes the rank of a site based on the ranks of other sites, which are in turn based on its own rank. (Unless the links happen to manifest a heirarchy or a tree shape.) So, the ranking will change during the first many trials of running the metric. After you get to about the fifth pass, things settle down.

Anyway, note the rankings and see if you can work it out. After the first set, try adding some other sites into the mix:

 pj = Site.new('PJ Hyett')
 tf = Site.new('Thomas Fuchs')
 matz = Site.new('Journal of Matz')

 red.trust matz, tf, pj
 eig.trust matz
 rails.trust tf
 ana.trust pj, matz
 matz.trust rails, eig
 tf.trust rails
 pj.trust rails, ana, prj

 10.times do
   puts "---" 
   red.trust_sphere.each do |site|
     puts "#{ site.name }: #{ site.pagerank }" 
   end
 end

Other challenges applicable to this metric:

  1. I’ve ignored the trust_worth method thus far. What if a site is linked many times? Alter the mixin to allow the units to positively influence the metric.
  2. All sites in the above equation start out equal. Try seeding the rankings. Maybe you want trust from Matz to be inherently more valuable, to add some kind of hierarchy to the mix.
  3. This trust metric computes a wholistic figure for ranking. In other words: How does this site rank in terms of the entire web? Try altering the fundamental question to: How does this site rank in terms of just Project.ioni.st’s view? This way, we can determine how sites not directly trusted by Project.ioni.st relate to it.

For extra points, if you are employed at Google, debunk my algorithm and leak us something sweet! (Complete code is in metric-pagerank.rb.)

Read: Trust Metrics Mixin #1: PageRank

Topic: Generating the feed list Previous Topic   Next Topic Topic: Template driven page generation

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use