Python Buzz Forum - Parsing namespaced RSS extensions

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Python Buzz Forum
Parsing namespaced RSS extensions

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Phillip Pearson

Posts: 1083
Nickname: myelin
Registered: Aug, 2003

Phillip Pearson is a Python hacker from New Zealand

Parsing namespaced RSS extensions

Posted: Oct 2, 2003 4:32 PM

This post originated from an RSS feed registered with Python Buzz by Phillip Pearson.
Original Post: Parsing namespaced RSS extensions Feed Title: Second p0st Feed URL: http://www.myelin.co.nz/post/rss.xml Feed Description: Tech notes and web hackery from the guy that brought you bzero, Python Community Server, the Blogging Ecosystem and the Internet Topic Exchange	Latest Python Buzz Posts Latest Python Buzz Posts by Phillip Pearson Latest Posts From Second p0st

OK, I've shown you how to parse RSS-Data. Now here's how to parse Les Orchard's other example: a namespaced RSS extension. Here's the code:

import re, urllib, xmlrpclib, os.path

from elementtree import ElementTree as et



# read les's example

html = urllib.urlopen('http://www.decafbad.com/blog/tech/rss_data_versus_namespace.html').read()



# turn the html-quoted example back into xml

for entity, char in (('lt', '<'), ('gt', '>'), ('amp', '&')):

    html = html.replace('&%s;' % entity, char)



# rip out the Amazon bit and get rid of the namespace

xml = re.search(r'(\<az\:ProductInfo\>.*\</az\:ProductInfo\>)', html, re.S).group(1)

xml = '<?xml version="1.0" encoding="iso-8859-1"?>' + xml.replace('az:', '')



book = et.XML(xml).find('Details')



# and we have the data!

et.dump(book)



# give out some details

authors = book.find('Authors')._children

print ("---\n%s, by %s" % (book.find('ProductName').text.strip(),

                           " and ".join([auth.text for auth in authors]),)

       ).encode('latin-1')

print "\nList %s, Amazon %s (Used %s)" % tuple(

    [book.find(x).text for x in ('ListPrice', 'OurPrice', 'UsedPrice')])

print "URL: %s" % book.get('url')

And here are the results:

phil@icefloe:~/projects/rss-data$ python parse-les-orchard-ns-example.py

<Details url="http://www.amazon.com/exec/obidos/ASIN/0439139597/0xdecafbad-20">

          <Asin>0439139597</Asin>

          <ProductName>Harry Potter and the Goblet of Fire (Book 4)</ProductName>

          <Catalog>Book</Catalog>

          <Authors>

            <Author>J. K. Rowling</Author>

            <Author>Mary GrandPr&#142;</Author>

          </Authors>

          <ReleaseDate>08 July, 2000</ReleaseDate>

          <Manufacturer>Scholastic</Manufacturer>

          <ImageUrlSmall>http://images.amazon.com/images/P/0439139597.01.THUMBZZZ.jpg</ImageUrlSmall>

          <ImageUrlMedium>http://images.amazon.com/images/P/0439139597.01.MZZZZZZZ.jpg</ImageUrlMedium>

          <ImageUrlLarge>http://images.amazon.com/images/P/0439139597.01.LZZZZZZZ.jpg</ImageUrlLarge>

          <Availability>Usually ships within 24 hours</Availability>

          <ListPrice>$25.95</ListPrice>

          <OurPrice>$18.16</OurPrice>

          <UsedPrice>$3.97</UsedPrice>

        </Details>

      ---

Harry Potter and the Goblet of Fire (Book 4), by J. K. Rowling and Mary GrandPr▒



List $25.95, Amazon $18.16 (Used $3.97)

URL: http://www.amazon.com/exec/obidos/ASIN/0439139597/0xdecafbad-20

Comment

Read: Parsing namespaced RSS extensions

Previous Topic

Next Topic


	Web Artima.com