Ruby Buzz Forum - Using Mechanize for HTML Scraping

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Ruby Buzz Forum
Using Mechanize for HTML Scraping

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

James Britt

Posts: 1319
Nickname: jamesbritt
Registered: Apr, 2003

James Britt is a principal in 30 Second Rule, and runs ruby-doc.org and rubyxml.com

Using Mechanize for HTML Scraping

Posted: Dec 4, 2005 1:41 PM

This post originated from an RSS feed registered with Ruby Buzz by James Britt.
Original Post: Using Mechanize for HTML Scraping Feed Title: James Britt: Ruby Development Feed URL: http://feeds.feedburner.com/JamesBritt-Home Feed Description: James Britt: Playing with better toys	Latest Ruby Buzz Posts Latest Ruby Buzz Posts by James Britt Latest Posts From James Britt: Ruby Development

There was some discussion on ruby-talk recently about HTML screen scraping, and some questions about using Michael Neumann's WWW::Mechanize library.

I mentioned that I was using that library to grab multiple CafePress pages and extract product data to assemble the rubystuff.com Web site. Someone asked if I could post my code as an example, which seemed a reasonable idea.

The code was never meant to be more than a way to save me from doing more work than absolutely necessary, but it turned out to be a pretty good, small-but-instructive example of what one can do with Mechanize. I cleaned up/refactored a few things, added in a narrative, and have put the results up here.

I'm not so sure that this will remain the final home for the article/example, though I'm thinking of using the Neurogami site as a repository for all my writing and code libraries. I have things on jamesbritt.com, rubyxml.com, the Linux Journal site, and maybe elsewhere, plus Ruby code hosted in almost as many different places. So some one-stop shopping might make it easier to keep track of things.

Read: Using Mechanize for HTML Scraping

Previous Topic

Next Topic


	Web Artima.com