This post originated from an RSS feed registered with Ruby Buzz
by Adam Green.
Original Post: RubyRiver architecture
Feed Title: ruby.darwinianweb.com
Feed URL: http://www.nemesis-one.com/rss.xml
Feed Description: Adam Green's Ruby development site
I have a couple of days free, so this looks like a good time to dig in and get some of the work done on the RubyRiver aggregator. I've been thinking about it for a week or more, and have decided to adopt a highly decoupled architecture. The functionality I'm aiming for isn't too rich, so I could build the whole thing as a single module, but since RSS aggregation is something I'm likely to use in multiple applications, a set of loosely coupled programs could provide me with better reuse in the future. Another decision is to collect all the feeds in a local cache before aggregating them. For some reason RSS feeds seem to have a lower availability than regular pages. I haven't figured out why, but with every aggregator I've used the collection phase seems to have a fairly high failure rate, with at least 5-10% of the feeds being unavailable at any one time. If I gather the feeds locally first and then combine them, I can always use the most recently retrieved version of each feed, even if I can't get all of them from the Web on every attempt. Another benefit of working with local copies of the feeds is that I can allow people to run sample programs on the local files without repeatedly hitting the website of the feed's author.
Here is a list of the modules I plan on building:
GetParam.rb: To keep the code as generic as possible all of the installation specific details like file and directory names will be kept in a text file in YAML format. This module will retrieve each value by name.
GatherFeeds.rb: This will open an OPML file that lists all the feeds and get each file from the Web. A local copy will be written into a directory that serves as a cache. I could store the feed data in MySQL, which might improve the performance, but since I plan on using this application as the basis for a tutorial for new Ruby programmers, working with text files throughout the code will make the sample code easier to follow.
CombineFeeds.rb: RubyRiver will be a "river of news" aggregator, also called a planet, so all the feeds will be combined into a single text file with the items in reverse chronological order.
GetFeedText.rb: This will read the combined feed and return a string that can be displayed in the content area of a webpage.
GetFeedList.rb: The navbar of the webpage will list all of the feeds with links to the feed and its website.
GeneratePage.rb: The RubyRiver home page will be generated as static HTML by combining a page template with the results of GetFeedText.rb and GetFeedList.rb.