The Artima Developer Community
Sponsored Link

Ruby Buzz Forum
Merging the feeds

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Adam Green

Posts: 102
Nickname: darwinian
Registered: Dec, 2005

Adam Green is the author of Ruby.Darwinianweb.com
Merging the feeds Posted: Jan 7, 2006 2:09 PM
Reply to this message Reply

This post originated from an RSS feed registered with Ruby Buzz by Adam Green.
Original Post: Merging the feeds
Feed Title: ruby.darwinianweb.com
Feed URL: http://www.nemesis-one.com/rss.xml
Feed Description: Adam Green's Ruby development site
Latest Ruby Buzz Posts
Latest Ruby Buzz Posts by Adam Green
Latest Posts From ruby.darwinianweb.com

Advertisement
This ended up being easier than I expected. Restricting the list of feeds gathered to just RSS 2.0 files helped keep it simple. I will add RSS 1.0 and Atom support in later versions. I should note once again that I am aware of the built-in RSS class and the FeedTools library. They would both have made the code much simpler, but they would also hide too many details that I want to learn and that I want to include in the tutorial based on this code. I also realize that building this code with classes would make a lot of sense. That is planned for a later version of the code, again for purposes of a tutorial. With that said, I welcome any suggestions. I'm especially curious to learn why I can't make any of the destructive versions of methods work, like Hash#sort!. When I try them, the argument's original value is returned.

merge_feeds.rb

 #! /usr/bin/ruby



# RubyRiver modules
require 'get_param'
require 'make_xml'



# Ruby library for parsing XML
require 'rexml/document'
include REXML



# Itemlist is a hash that will hold all the merged feed items.
itemlist = {}



cachedir = get_param("rubyriver.yml","cachedir")



# Parse each of the feed files from the local cache.
Dir.new(cachedir).each do |filename|
if filename =~ /.xml/



# Read the feed into memory as an XML document.
feed = File.open(cachedir + "/" + filename)
doc = Document.new(feed.read)
feed.close



# Extract the title and link for the entire feed.
# This will be added to each item in the merged feed.
feedtitle = doc.elements["rss/channel/title"].text
feedlink = doc.elements["rss/channel/link"].text



# Extract the details of each feed item and add it to the itemlist hash.
doc.elements.each("rss/channel/item") do |item|
pubdate = item.elements["pubDate"].text



# Each item merged must have a pubdate to allow sorting.
if not pubdate.empty?



# Strip HTML tags from the post's text.
# This is necessary, because only an excerpt will be published.
description = item.elements["description"].text
description = description.gsub(/<[^>]+>/,"")



# Extract an excerpt of the text.
max = get_param("rubyriver.yml","excerptlength")
excerpt = description[0..max-1]



# Don't split words.
# Add characters until a space or the end of description is reached.
while description[excerpt.length,1] != " " and excerpt.length+1 <= description.length
excerpt += description[excerpt.length,1]
end



# Add a link to the original post if there is more in the description.
if excerpt.length < description.length
excerpt += ' <a href="' + item.elements["link"].text + '">[more]</a>'
end



# Wrap the excerpt in a CDATA tag to hide invalid XML.
excerpt = "<![CDATA[" + excerpt + "]]>"



# Build a hash of all feed items.
# The date/time in pubdate will be used to sort the items.
itemlist[Time.parse(pubdate)] = { "pubdate" => pubdate,
"feedtitle" => feedtitle,
"feedlink" => feedlink,
"title" => item.elements["title"].text,
"link" => item.elements["link"].text,
"description" => excerpt }



end
end
end
end



# Get the first N items in reverse chronological order.
# .sort!, .reverse! and .slice! should work here, but they don't.
# .sort returns an array.
itemlist = itemlist.sort
itemlist = itemlist.reverse
itemlist = itemlist.slice(0,get_param("rubyriver.yml","maxposts"))



# Create XML files with the resulting items.
# The published feed will be standard RSS 2.0. This can only
# include a title and link for the individual feed item.
# The internal feed will include the item's overall feed title and feed link for
# use on the RubyRiver page.
make_xml(itemlist,get_param("rubyriver.yml","publishedfeed"))
make_xml(itemlist,get_param("rubyriver.yml","internalfeed"))

Read: Merging the feeds

Topic: Working with a unified time Previous Topic   Next Topic Topic: PDX Ruby Brigade in 2006

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use