Ruby Buzz Forum - Making concurrency simple, and a multi-threaded downloader in 4 (long) lines

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Ruby Buzz Forum
Making concurrency simple, and a multi-threaded downloader in 4 (long) lines

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Eigen Class

Posts: 358
Nickname: eigenclass
Registered: Oct, 2005

Eigenclass is a hardcore Ruby blog.

Making concurrency simple, and a multi-threaded downloader in 4 (long) lines

Posted: Apr 14, 2006 8:59 AM

This post originated from an RSS feed registered with Ruby Buzz by Eigen Class.
Original Post: Making concurrency simple, and a multi-threaded downloader in 4 (long) lines Feed Title: Eigenclass Feed URL: http://feeds.feedburner.com/eigenclass Feed Description: Ruby stuff --- trying to stay away from triviality.	Latest Ruby Buzz Posts Latest Ruby Buzz Posts by Eigen Class Latest Posts From Eigenclass

You might not have noticed it, but every page on eigenclass.org lists the most popular referrers. I often find interesting things in the Referer field, but unfortunately they are hard to find (especially for an occasional visitor) in the middle of unaccessible pages (bloglines, google reader, other online RSS aggregators...) and (as of late) referrer spam.

I'm now filtering referrer URLs as I get them, but I also wanted to purge the historical data contained in the "referrer database". Unsurprisingly, I wrote a script for that.

Filtering referrers entails a fair bit of network traffic, to fetch the referring URLs and verify that they can be accessed and seem legitimate. Performing these checks serially would take forever (establishing the connection, issuing the HTTP request, waiting for the data, timeouts, ...) and I wouldn't be utilizing my bandwidth efficiently.

The obvious solution is performing several operations in parallel to maximize bandwidth usage.

Pooling handlers

The idea is creating a PoolingExecutor object that assigns tasks to a bounded number of handlers and runs them in separate threads. This way we optimize the use of some limited resource (in this case, bandwidth, but it could also be DB connections, etc...) --- since we're not CPU-bound, while avoiding an overload.

The API is:

executor = PoolingExecutor.new do |handlers|
  NUM_HANDLERS.times do 
    handlers << SomeHandler.new(stuff)
  end
end

# later

# each task is run in a different thread, but the num of simultaneous
# threads is bounded
executor.run do |handler|
  # perform task with the handler
  # e.g.
  foo( handler.process(stuff) )
end
executor.run do |handler|
  # ....
end
executor.wait_for_all # ensure all the tasks scheduled with executor are
                      # finished
# ....

Previous Topic

Next Topic


	Web Artima.com