This post originated from an RSS feed registered with Ruby Buzz
by Red Handed.
Original Post: Trust Metrics Mixin #1: PageRank
Feed Title: RedHanded
Feed URL: http://redhanded.hobix.com/index.xml
Feed Description: sneaking Ruby through the system
We get on these little kicks. As we’re all snooping around. Little protocols or obscure version control systems. Or domino games or something.
Trust metric, man. It just strikes me all the time: I imagine everyone in the world with their hands cupped, a little ball of clay in the middle. Everyone gets a moment to divide up their ball into pieces for selected friends and relatives. Trick is they have to divide up the whole ball. They can’t hang on to any. Clay is redistributed. One guy has a big monolith! People, don’t give him any more! (Unless, of course, he really can be trusted with clay.)
Here’s today’s challenge. You are given objects which respond to:
trust(*objs): Assigns one unit of trust in an arc from self to each of the objs.
trusts?(obj): Tests self for any trust of obj.
trust_spread: The total number of trusted objects.
trust_sphere: All objects ranked in the system.
Given these objects, provide a mixin which calculates trust. In today’s case, we’ll be mixing in a pagerank method, which approximates rank based on a primitive form of Google’s algorithm.
module GoogleTrust
DAMPEN = 0.85
def pagerank(ranks = {})
return ranks[self] if ranks.has_key? self
ranks[self] = @rank || 0.0
@rank =
(1 - DAMPEN) +
(DAMPEN * trust_sphere.inject(0) do |n, (o, c)|
n + (o.trusts?(self) ? o.pagerank(ranks) / o.trust_spread: 0)
end)
end
end
This algorithm is an approximation of Google’s PageRank, as described in PageRank Explained Correctly with Examples by some third party. Regardless, the basic equation can be seen throughout the web and I will refer you to the paper above in sorting out exactly how PageRank actually works.
You probably get the jist, though: by linking to sites, you are extending a unit of trust to them. But also: the value of your link is also based on the links you are receiving. So, it’s sort of like a tent, ranks evolve from a gelatinous blob into a more rigid and beautifully useful form.
Let’s get some sites running our PageRank. I’ll use some blogs from the Ruby world:
class Site
include GoogleTrust
@@sites = []
attr_accessor :name
def initialize(name)
@name = name
@arcs = {}
@@sites.push self
end
def trust *sites
sites.each { |site| @arcs[site] = (@arcs[site] || 0) + 1 }
end
def trusts? site; @arcs.has_key? site end
def trust_worth site; @arcs[site] end
def trust_spread; @arcs.length end
def trust_sphere; @@sites end
end
red = Site.new('RedHanded')
eig = Site.new('Eigenclass.org')
rails = Site.new('Ruby on Rails')
ana = Site.new('Anarchaia')
prj = Site.new('Project.ioni.st')
So, sites are represented by a name and an arc of other objects, representing sites which have been linked to. Here are five blogs, let’s make a web.
red.trust rails, eig, ana
eig.trust ana
rails.trust prj, eig
ana.trust prj, red, rails, eig
prj.trust red, rails, eig, ana
40.times do |i|
puts "--- PASS #{i} ---"
red.trusted_sphere.each do |site|
puts "#{ site.name }: #{ site.pagerank }"
end
end
The pagerank is recursive, it computes the rank of a site based on the ranks of other sites, which are in turn based on its own rank. (Unless the links happen to manifest a heirarchy or a tree shape.) So, the ranking will change during the first many trials of running the metric. After you get to about the fifth pass, things settle down.
Anyway, note the rankings and see if you can work it out. After the first set, try adding some other sites into the mix:
pj = Site.new('PJ Hyett')
tf = Site.new('Thomas Fuchs')
matz = Site.new('Journal of Matz')
red.trust matz, tf, pj
eig.trust matz
rails.trust tf
ana.trust pj, matz
matz.trust rails, eig
tf.trust rails
pj.trust rails, ana, prj
10.times do
puts "---"
red.trust_sphere.each do |site|
puts "#{ site.name }: #{ site.pagerank }"
end
end
Other challenges applicable to this metric:
I’ve ignored the trust_worth method thus far. What if a site is linked many times? Alter the mixin to allow the units to positively influence the metric.
All sites in the above equation start out equal. Try seeding the rankings. Maybe you want trust from Matz to be inherently more valuable, to add some kind of hierarchy to the mix.
This trust metric computes a wholistic figure for ranking. In other words: How does this site rank in terms of the entire web? Try altering the fundamental question to: How does this site rank in terms of just Project.ioni.st’s view? This way, we can determine how sites not directly trusted by Project.ioni.st relate to it.
For extra points, if you are employed at Google, debunk my algorithm and leak us something sweet! (Complete code is in metric-pagerank.rb.)