Python Buzz Forum - Parsing RSS-Data

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Python Buzz Forum
Parsing RSS-Data

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Phillip Pearson

Posts: 1083
Nickname: myelin
Registered: Aug, 2003

Phillip Pearson is a Python hacker from New Zealand

Parsing RSS-Data

Posted: Oct 2, 2003 2:33 PM

This post originated from an RSS feed registered with Python Buzz by Phillip Pearson.
Original Post: Parsing RSS-Data Feed Title: Second p0st Feed URL: http://www.myelin.co.nz/post/rss.xml Feed Description: Tech notes and web hackery from the guy that brought you bzero, Python Community Server, the Blogging Ecosystem and the Internet Topic Exchange	Latest Python Buzz Posts Latest Python Buzz Posts by Phillip Pearson Latest Posts From Second p0st

As a companion to Les Orchard's RSS-Data versus namespace examples, here's some Python code that will parse the RSS-Data version:

import re, urllib, xmlrpclib, os.path

from pprint import pprint



# read les's example

html = urllib.urlopen('http://www.decafbad.com/blog/tech/rss_data_versus_namespace.html').read()



# turn the html-quoted example back into xml

for entity, char in (('lt', '<'), ('gt', '>'), ('amp', '&')):

    html = html.replace('&%s;' % entity, char)



# rip out the rss-data bit and get rid of the namespace

xml = re.search(r'(\<sdl\:data\>.*\</sdl\:data\>)', html, re.S).group(1)

xml = xml.replace('sdl:', '')



# feed it through xmlrpclib

p, u = xmlrpclib.getparser()

p.feed(xml)

p.close()



# and we have the data!

pprint(u._stack[0])

Here's what you get when you run it:

phil@icefloe:~/projects/rss-data$ python test.py

{'Asin': '0439139597',

 'Authors': ['J. K. Rowling', 'Mary GrandPr'],

 'Availability': 'Usually ships within 24 hours',

 'Catalog': 'Book',

 'ImageUrlLarge': 'http://images.amazon.com/images/P/0439139597.01.LZZZZZZZ.jpg',

 'ImageUrlMedium': 'http://images.amazon.com/images/P/0439139597.01.MZZZZZZZ.jpg',

 'ImageUrlSmall': 'http://images.amazon.com/images/P/0439139597.01.THUMBZZZ.jpg',

 'ListPrice': '$25.95',

 'Manufacturer': 'Scholastic',

 'OurPrice': '$18.16',

 'ProductName': '\n                Harry Potter and the Goblet of Fire (Book 4)\n              ',

 'ReleaseDate': <DateTime 2000-07-08T00:00:00 at 818012c>,

 'UsedPrice': '$3.97',

 'url': 'http://www.amazon.com/exec/obidos/ASIN/0439139597/0xdecafbad-20'}

Comment

Read: Parsing RSS-Data

Previous Topic

Next Topic


	Web Artima.com