This post originated from an RSS feed registered with Python Buzz
by maxim khesin.
Original Post: lazy Hamming
Feed Title: python and the web
Feed URL: http://feeds.feedburner.com/PythonAndTheWeb
Feed Description: blog dedicated to python and the networks we live in
In last month's C++ user's Journal Andrew Koenig & Barbara Moo covered different ways of doing the "Hamming problem". For the uninitiated (like me), the problem is to produce a sequence of all integers, in increasing order, that only have 2, 3 & 5 as their divisors. One of the nicer solutions presented was a piece of Haskell code that uses lazy evaluation. It looked something like this:
hamming = 1 : merge (f 2) (merge (f 3) (f 5)) where f a = [ n*a | n <- hamming ] merge (a:x) (b:y) = a : merge x (b:y), if a < b = b : merge (a:x) y, if a > b = a : merge x y, otherwise
It is no secret that Python's design has been influenced by Haskell, and remenbering that we already have support for lazy sequences with generators I attempted translating this code into Python. My first take looked like this:
This version has a bit of a 'literal' flavor in the sense that an actual sequence is kept in a list, but basically does the job. After looking around on da net I found the most serious discussion of Hamming on, surprise surprise, the Python mailing list. It offers some interesting alternatives and shows an 'evolutionary' improvment of the code by a number of people, which is nice to see. The excercise was very worthwhile as it pushed me to catch up on some of the itertools stuff, particularly the tee() function. Here is the solution I like best, offered on the list by Nick Craig-Wood:
from itertools import *
def imerge(*generators): values = [ g.next() for g in generators ] while True: x = min(values) yield x for i in range(len(values)): if values[i] == x: values[i] = generators[i].next()
def hamming(): def _hamming(): yield 1 for i in imerge(imap(lambda x:2*x, hamming2), imap(lambda x:3*x, hamming3), imap(lambda x:5*x, hamming5)): yield i
hamming2, hamming3, hamming5, result = tee(_hamming(), 4) return result
for i in hamming(): print i if i > 10000:break
This is very cool. BTW, this is one of the things that confused me about tee(): in my head I kept wondering whether the copies of the iterator are going to evaluate the generator for the second time (if you aren't bothered by this, it is probably my own brain quirk). Looking at the doc. tee() pseudocode really helped: the answer is that only the 'front' iterator is going to evaluate the generator, the rest are simply going to look up the value in the cache. Of course in some algorithms the "front" iterator can fall behind, but then some other is going to assume the "front" position. It is interesting that tee apparently does preserve the entire sequence, which is what I did explicitly. I cannot prove it offhand, but it seems very intuitive that only 3 highest values of the sequence must be kept in memory at the same time. I wonder if there is a of expressing this which is as elegant as the tee() solution...