Ruby Buzz Forum - Okay, Give Hpricot 0.2 a Go

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Ruby Buzz Forum
Okay, Give Hpricot 0.2 a Go

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Red Handed

Posts: 1158
Nickname: redhanded
Registered: Dec, 2004

Red Handed is a Ruby-focused group blog.

Okay, Give Hpricot 0.2 a Go

Posted: Jul 5, 2006 12:38 PM

This post originated from an RSS feed registered with Ruby Buzz by Red Handed.
Original Post: Okay, Give Hpricot 0.2 a Go Feed Title: RedHanded Feed URL: http://redhanded.hobix.com/index.xml Feed Description: sneaking Ruby through the system	Latest Ruby Buzz Posts Latest Ruby Buzz Posts by Red Handed Latest Posts From RedHanded

This time I’m giving a balloon out which can be used for quick testing.

http://balloon.hobix.com/hpricot

Or, if you want to install Hpricot 0.2:

gem install hpricot --source code.whytheluckystiff.net

So the Hpricot parser is basically complete. There’s still lots of fiddling ahead: it doesn’t handle Javascript whatsoever and it’s not yet as flexible as HTree. However, it does fix alot of HTML that RubyfulSoup and the htmltools won’t.

Here’s a benchmark parsing the Boing Boing home page fifty times. It’s a good page to test because it’s big and there’s some bogus end tags and old-style tables and break tags.

 user system total real
 hpricot: 10.515625 0.000000 10.515625 ( 10.610571)
 htree: 56.609375 0.023438 56.632812 ( 57.096530)
 rubyfulsoup: 29.289062 0.046875 29.335938 ( 29.586510)
 mechanize: 148.132812 1.101562 149.234375 (150.621922)

The mechanize benchmark parses and converts to a REXML document, since mechanize itself only gives you links, form elements, nothing complex. So this may be unfair.

I didn’t include scrapi because, although it parses the page, it fails some of my other tests. For example, when using a selector to find all p.posted elements, I get back only one element with scrapi, when the others all report back sixty elements. So, I’ll post a benchmark when I understand what I’m doing wrong.

Read: Okay, Give Hpricot 0.2 a Go

Previous Topic

Next Topic


	Web Artima.com