This post originated from an RSS feed registered with Python Buzz
by Phillip Pearson.
Original Post: Parsing namespaced RSS extensions
Feed Title: Second p0st
Feed URL: http://www.myelin.co.nz/post/rss.xml
Feed Description: Tech notes and web hackery from the guy that brought you bzero, Python Community Server, the Blogging Ecosystem and the Internet Topic Exchange
OK, I've shown you how to parse RSS-Data. Now here's how to parse Les Orchard's otherexample: a namespaced RSS extension. Here's the code:
import re, urllib, xmlrpclib, os.path
from elementtree import ElementTree as et
# read les's example
html = urllib.urlopen('http://www.decafbad.com/blog/tech/rss_data_versus_namespace.html').read()
# turn the html-quoted example back into xml
for entity, char in (('lt', '<'), ('gt', '>'), ('amp', '&')):
html = html.replace('&%s;' % entity, char)
# rip out the Amazon bit and get rid of the namespace
xml = re.search(r'(\<az\:ProductInfo\>.*\</az\:ProductInfo\>)', html, re.S).group(1)
xml = '<?xml version="1.0" encoding="iso-8859-1"?>' + xml.replace('az:', '')
book = et.XML(xml).find('Details')
# and we have the data!
et.dump(book)
# give out some details
authors = book.find('Authors')._children
print ("---\n%s, by %s" % (book.find('ProductName').text.strip(),
" and ".join([auth.text for auth in authors]),)
).encode('latin-1')
print "\nList %s, Amazon %s (Used %s)" % tuple(
[book.find(x).text for x in ('ListPrice', 'OurPrice', 'UsedPrice')])
print "URL: %s" % book.get('url')
And here are the results:
phil@icefloe:~/projects/rss-data$ python parse-les-orchard-ns-example.py
<Details url="http://www.amazon.com/exec/obidos/ASIN/0439139597/0xdecafbad-20">
<Asin>0439139597</Asin>
<ProductName>Harry Potter and the Goblet of Fire (Book 4)</ProductName>
<Catalog>Book</Catalog>
<Authors>
<Author>J. K. Rowling</Author>
<Author>Mary GrandPrŽ</Author>
</Authors>
<ReleaseDate>08 July, 2000</ReleaseDate>
<Manufacturer>Scholastic</Manufacturer>
<ImageUrlSmall>http://images.amazon.com/images/P/0439139597.01.THUMBZZZ.jpg</ImageUrlSmall>
<ImageUrlMedium>http://images.amazon.com/images/P/0439139597.01.MZZZZZZZ.jpg</ImageUrlMedium>
<ImageUrlLarge>http://images.amazon.com/images/P/0439139597.01.LZZZZZZZ.jpg</ImageUrlLarge>
<Availability>Usually ships within 24 hours</Availability>
<ListPrice>$25.95</ListPrice>
<OurPrice>$18.16</OurPrice>
<UsedPrice>$3.97</UsedPrice>
</Details>
---
Harry Potter and the Goblet of Fire (Book 4), by J. K. Rowling and Mary GrandPr▒
List $25.95, Amazon $18.16 (Used $3.97)
URL: http://www.amazon.com/exec/obidos/ASIN/0439139597/0xdecafbad-20