The Artima Developer Community
Sponsored Link

Ruby Buzz Forum
Okay, Give Hpricot 0.2 a Go

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Red Handed

Posts: 1158
Nickname: redhanded
Registered: Dec, 2004

Red Handed is a Ruby-focused group blog.
Okay, Give Hpricot 0.2 a Go Posted: Jul 5, 2006 12:38 PM
Reply to this message Reply

This post originated from an RSS feed registered with Ruby Buzz by Red Handed.
Original Post: Okay, Give Hpricot 0.2 a Go
Feed Title: RedHanded
Feed URL: http://redhanded.hobix.com/index.xml
Feed Description: sneaking Ruby through the system
Latest Ruby Buzz Posts
Latest Ruby Buzz Posts by Red Handed
Latest Posts From RedHanded

Advertisement

This time I’m giving a balloon out which can be used for quick testing.

http://balloon.hobix.com/hpricot

Or, if you want to install Hpricot 0.2:

gem install hpricot --source code.whytheluckystiff.net

So the Hpricot parser is basically complete. There’s still lots of fiddling ahead: it doesn’t handle Javascript whatsoever and it’s not yet as flexible as HTree. However, it does fix alot of HTML that RubyfulSoup and the htmltools won’t.

Here’s a benchmark parsing the Boing Boing home page fifty times. It’s a good page to test because it’s big and there’s some bogus end tags and old-style tables and break tags.

 user system total real
 hpricot: 10.515625 0.000000 10.515625 ( 10.610571)
 htree: 56.609375 0.023438 56.632812 ( 57.096530)
 rubyfulsoup: 29.289062 0.046875 29.335938 ( 29.586510)
 mechanize: 148.132812 1.101562 149.234375 (150.621922)

The mechanize benchmark parses and converts to a REXML document, since mechanize itself only gives you links, form elements, nothing complex. So this may be unfair.

I didn’t include scrapi because, although it parses the page, it fails some of my other tests. For example, when using a selector to find all p.posted elements, I get back only one element with scrapi, when the others all report back sixty elements. So, I’ll post a benchmark when I understand what I’m doing wrong.

Read: Okay, Give Hpricot 0.2 a Go

Topic: GeoPortail Previous Topic   Next Topic Topic: The Early Inhabitants of Balloon

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use