Hpricot is a nice, loose HTML parser for Ruby, written in C. I stole a bunch of code and ideas from HTree, Prototype and JQuery. The gem requires a compiler. It’s 0.1, so it’s kinda wobbly, but hey.
require 'hpricot'
doc = Hpricot.parse("index.html")
(doc/:p/:a).each do |links|
p link.attributes
end