The Artima Developer Community
Sponsored Link

Python Buzz Forum
Parsing RSS-Data

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Phillip Pearson

Posts: 1083
Nickname: myelin
Registered: Aug, 2003

Phillip Pearson is a Python hacker from New Zealand
Parsing RSS-Data Posted: Oct 2, 2003 2:33 PM
Reply to this message Reply

This post originated from an RSS feed registered with Python Buzz by Phillip Pearson.
Original Post: Parsing RSS-Data
Feed Title: Second p0st
Feed URL: http://www.myelin.co.nz/post/rss.xml
Feed Description: Tech notes and web hackery from the guy that brought you bzero, Python Community Server, the Blogging Ecosystem and the Internet Topic Exchange
Latest Python Buzz Posts
Latest Python Buzz Posts by Phillip Pearson
Latest Posts From Second p0st

Advertisement
As a companion to Les Orchard's RSS-Data versus namespace examples, here's some Python code that will parse the RSS-Data version:

import re, urllib, xmlrpclib, os.path
from pprint import pprint

# read les's example
html = urllib.urlopen('http://www.decafbad.com/blog/tech/rss_data_versus_namespace.html').read()

# turn the html-quoted example back into xml
for entity, char in (('lt', '<'), ('gt', '>'), ('amp', '&')):
    html = html.replace('&%s;' % entity, char)

# rip out the rss-data bit and get rid of the namespace
xml = re.search(r'(\<sdl\:data\>.*\</sdl\:data\>)', html, re.S).group(1)
xml = xml.replace('sdl:', '')

# feed it through xmlrpclib
p, u = xmlrpclib.getparser()
p.feed(xml)
p.close()

# and we have the data!
pprint(u._stack[0])


Here's what you get when you run it:

phil@icefloe:~/projects/rss-data$ python test.py
{'Asin': '0439139597',
 'Authors': ['J. K. Rowling', 'Mary GrandPr'],
 'Availability': 'Usually ships within 24 hours',
 'Catalog': 'Book',
 'ImageUrlLarge': 'http://images.amazon.com/images/P/0439139597.01.LZZZZZZZ.jpg',
 'ImageUrlMedium': 'http://images.amazon.com/images/P/0439139597.01.MZZZZZZZ.jpg',
 'ImageUrlSmall': 'http://images.amazon.com/images/P/0439139597.01.THUMBZZZ.jpg',
 'ListPrice': '$25.95',
 'Manufacturer': 'Scholastic',
 'OurPrice': '$18.16',
 'ProductName': '\n Harry Potter and the Goblet of Fire (Book 4)\n ',
 'ReleaseDate': <DateTime 2000-07-08T00:00:00 at 818012c>,
 'UsedPrice': '$3.97',
 'url': 'http://www.amazon.com/exec/obidos/ASIN/0439139597/0xdecafbad-20'}

Comment

Read: Parsing RSS-Data

Topic: How do you send blog posts with complicated RSS over the metaWeblog API? Previous Topic   Next Topic Topic: Watch TCP sessions with this Windows build of tcpconndbg ...

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use