Using google (or any other search engine) to generate persistent URLs is one of those obvious ideas
that make you wonder if you came to them on your own before being exposed.
At any rate, I had never seen an implementation*1, so here's mine.
But first of all, some examples of the persistent URLs created by the script
shown below:
extract candidate search terms from the desired destination URL:
only consider text
try to find significant terms
check against google, verifying if the chosen query is good enough
Extracting text from arbitrary HTML pages
There's no need for a full parse tree of the HTML: just the list of words that
would be considered by google will do.
I took some old code of mine,
from one of my very first (useful) Ruby scripts (a filtering proxy that added hints
to German pages, inspired by jisyo.org, which does the same for Japanese
text). It just uses a number of regexps to reject unwanted
parts of the text, until we're left with simple words.
It's not too inefficient thanks to
strscan, and as naïve as the regexps might seem, they work well in practice: