This post originated from an RSS feed registered with Python Buzz
by Jarno Virtanen.
Original Post: How to do a Conditional HTTP GET with Python urllib2
Feed Title: Python owns us
Feed URL: http://sedoparking.com/search/registrar.php?domain=®istrar=sedopark
Feed Description: A weblog about Python from the view point of Jarno Virtanen.
Now that we have the modification information we can use them in
succeeding requests. Additionally, we need an error handler
for the request, because the server responses with a status code 304,
if the web page has not been modified. So we set up an error handler
based on urllib2's BaseHandler. The handler
looks like this:
The handler is called on HTTP status code 304. If that's the case, we
make a fake URL-handle with urllib2's
addinfourl class, which we pass back to the caller, so
that we can process the result of the open() like it were
a usual request. We also add the status code to the fake URL handle.
Next we make use of these facilities in another request. This time we
use a OpenerDirector builder and we add a couple of
headers to the request:
req = urllib2.Request(URL)
if etag:
req.add_header("If-None-Match", etag)
if last_modified:
req.add_header("If-Modified-Since", last_modified)
opener = urllib2.build_opener(NotModifiedHandler())
url_handle = opener.open(req)
headers = url_handle.info() # the addinfourls have the .info() too
if hasattr(url_handle, 'code') and url_handle.code == 304:
print "the web page has not been modified"
Now the program should print out that "the web page has not been
modified".
In this example the two requests were made in the same run of the
program so the values of the ETag and Last-Modified were kept in
Python strings. Typically you need to do this between different runs
of the program and therefore you need to store the modification
information (for example) on the disk.