The Artima Developer Community
Sponsored Link

Ruby Buzz Forum
No, XPath on Messy HTML is Just as Easy in Ruby

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Red Handed

Posts: 1158
Nickname: redhanded
Registered: Dec, 2004

Red Handed is a Ruby-focused group blog.
No, XPath on Messy HTML is Just as Easy in Ruby Posted: Aug 26, 2005 1:37 PM
Reply to this message Reply

This post originated from an RSS feed registered with Ruby Buzz by Red Handed.
Original Post: No, XPath on Messy HTML is Just as Easy in Ruby
Feed Title: RedHanded
Feed URL: http://redhanded.hobix.com/index.xml
Feed Description: sneaking Ruby through the system
Latest Ruby Buzz Posts
Latest Ruby Buzz Posts by Red Handed
Latest Posts From RedHanded

Advertisement

You think XPath is easier in Javascript than in Ruby when it comes to invalid HTML? I’ve heard this from a lot of correspondence over the past week. Because Javascript has the DOM, right?

Use HTree+REXML. HTree cleans and REXML peppers and gobbles. Here’s a hairy, little method that will save some pain:

 require 'htree'
 require 'rexml/document'
 require 'open-uri'

 def read_xhtml_from( uri )
   open( uri ) { |f| HTree.parse f }.each_child do |child|
     if child.respond_to? :qualified_name
       doc = ""; child.display_xml( doc )
       if child.qualified_name == 'html'
         return REXML::Document.new( doc ) 
       end
     end
   end
 end 

Okay, so. How to use it? That nice REXML way you’re already used to.

 html = read_xhtml_from "http://redhanded.hobix.com/" 
 html.each_element( "//div[@class='entryFooter']" ) do |e|
   puts e.text( "./a[starts-with(@href, 'http://redhanded.hobix.com/')]" )
 end

Read: No, XPath on Messy HTML is Just as Easy in Ruby

Topic: Less Condescension, please Previous Topic   Next Topic Topic: Canada's new ruby programmer

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use