The Artima Developer Community
Sponsored Link

Ruby Buzz Forum
Persistent URLs: really easy (thank you open-uri, SOAP4R, Ruby)

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Eigen Class

Posts: 358
Nickname: eigenclass
Registered: Oct, 2005

Eigenclass is a hardcore Ruby blog.
Persistent URLs: really easy (thank you open-uri, SOAP4R, Ruby) Posted: Mar 29, 2006 8:39 AM
Reply to this message Reply

This post originated from an RSS feed registered with Ruby Buzz by Eigen Class.
Original Post: Persistent URLs: really easy (thank you open-uri, SOAP4R, Ruby)
Feed Title: Eigenclass
Feed URL: http://feeds.feedburner.com/eigenclass
Feed Description: Ruby stuff --- trying to stay away from triviality.
Latest Ruby Buzz Posts
Latest Ruby Buzz Posts by Eigen Class
Latest Posts From Eigenclass

Advertisement

Using google (or any other search engine) to generate persistent URLs is one of those obvious ideas that make you wonder if you came to them on your own before being exposed. At any rate, I had never seen an implementation*1, so here's mine.

But first of all, some examples of the persistent URLs created by the script shown below:

It doesn't always work that well; for instance, the persistent URL of my Ruby 1.9 change summary (the first hit for http://google.com/search?q=ruby+1.9 ), becomes http://google.com/search?q=ruby+foo+file+method+nil+array+index+proc+def+methods .

Implementation

This is pretty easy, all one needs to do is:

  • extract candidate search terms from the desired destination URL:
    • only consider text
    • try to find significant terms
  • check against google, verifying if the chosen query is good enough

Extracting text from arbitrary HTML pages

There's no need for a full parse tree of the HTML: just the list of words that would be considered by google will do.

I took some old code of mine, from one of my very first (useful) Ruby scripts (a filtering proxy that added hints to German pages, inspired by jisyo.org, which does the same for Japanese text). It just uses a number of regexps to reject unwanted parts of the text, until we're left with simple words. It's not too inefficient thanks to strscan, and as naïve as the regexps might seem, they work well in practice:


Read more...

Read: Persistent URLs: really easy (thank you open-uri, SOAP4R, Ruby)

Topic: The adventures of scaling, Stage 3 Previous Topic   Next Topic Topic: Request: 3rd party Ruby libs

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use