The Artima Developer Community
Sponsored Link

Agile Buzz Forum
Feed Scraping Tools

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
James Robertson

Posts: 29924
Nickname: jarober61
Registered: Jun, 2003

David Buck, Smalltalker at large
Feed Scraping Tools Posted: Nov 27, 2004 4:34 PM
Reply to this message Reply

This post originated from an RSS feed registered with Agile Buzz by James Robertson.
Original Post: Feed Scraping Tools
Feed Title: Cincom Smalltalk Blog - Smalltalk with Rants
Feed URL: http://www.cincomsmalltalk.com/rssBlog/rssBlogView.xml
Feed Description: James Robertson comments on Cincom Smalltalk, the Smalltalk development community, and IT trends and issues in general.
Latest Agile Buzz Posts
Latest Agile Buzz Posts by James Robertson
Latest Posts From Cincom Smalltalk Blog - Smalltalk with Rants

Advertisement

Awhile back Bob mentioned some scraping tools he created for use with BottomFeeder. I decided to have a look at them today, because I decided that I'd like to have a subscription to User Friendly. I loaded Bob's code (Simple Script Runner from the public Store) and had a look. I decided that it would be more useful if it had some SAX drivers attached, so I created a new bundle - RSSScriptRunner - that included a few. I'm planning to enhance this little package some, but in the meantime the following script produces a valid feed with today's User Friendly comic in my local Bf directory:


| writer content str rest out |
contentBlock := [:builder :chunk |
	| stream |
	stream := ReadStream on: chunk.
	stream throughAll: 'SRC="'.
	builder link: (stream upTo: $").
	builder title: 'User Friendly For: ', Date today printString.
	builder description: '<a href="', chunk.
	builder pubDate: Timestamp now].

out := 'userFriendly.xml' asFilename writeStream.
writer := RSS20_SAXWriter new output: out.
writer prolog.
writer startRSS.
writer startChannel.
writer title: 'User Friendly Feed'.
writer link: 'http://www.userfriendly.org/'.
writer description: 'User Friendly Feed'.
writer pubDate: Timestamp now.
writer startItem.
writer title: 'User Friendly For: ', Date today printString.
content := 'http://www.userfriendly.org/' asURI valueStream contents.
str := content readStream.
str throughAll: 'CARTOON FOR'.
str upToAll: 'href="'.
rest := str throughAll: '</A>'.
contentBlock value: writer value: rest.
writer endItem.
writer endChannel.
writer endRSS.
out close.

Works like a charm

Read: Feed Scraping Tools

Topic: IT Management never learns Previous Topic   Next Topic Topic: How To Create A Custom Widget - Select And Show Date

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use