This post originated from an RSS feed registered with Python Buzz
by Ian Bicking.
Original Post: Kata 19: an optimization anecdote
Feed Title: Ian Bicking
Feed URL: http://www.ianbicking.org/feeds/atom.xml
Feed Description: Thoughts on Python and Programming.
I spent some time last night working on Kata
19 (these Katas are little programming exercises, a nice
programming break from writing Useful Code). This was definitely
the hardest one I've done so far -- you find paths between words,
where words are connected if they only differ by one letter. For
example: beast to least to leant to leans to loads to loons to
loops.
The algorithm I used was pretty quick. Searches take at most
0.1sec, return the shortest path, and do not block on impossible
paths. It does a breadth-first search, starting from both the first
and terminal words (finding where they meet somewhere in the
middle).
The problem was that it was really slow to start up. Reading my
dictionary of 96 thousand words took about 53sec (and 166Mb of memory)
on my Athlon 650MHz PC. Once it is loaded it is really fast. Perfect
for a word-path-finding-server. But I don't think this justifies a
persistent server.
Here is this original
code. I then set upon optimizing the start time. Here's the
things that worked well:
I was creating a dictionary that looked like {'hat':
Set(['_at', 'h_t', 'ha_']), ...}. Instead of calculating all
these permutations up-front, I calculated them as needed. Down to
31sec (22sec gained) and 124Mb.
Garbage collection was hitting. I could tell because the
reading would stall at random times (when the GC was doing work). I
didn't need this garbage collection while I was reading data in,
because I knew no circular garbage was being created (and not much
non-circular garbage was created either). Using the gc
module, I turned this off. Down to 17sec (14sec gained), doesn't
help memory.
I used sets
in several places. One of the central data
structures was a dictionary that looked like {'_at': Set(['hat',
'bat', 'sat', ...]), ...}. But I never used set operations on
those values, I only iterated over them. I changed it to a
list. Down to 9sec (8sec gained), 75Mb memory.
Those were all the effective changes I made. I tried some other
things, to little or no effect:
Using .setdefault(word_with_unders, []).append(word)
for creating the dictionary. Gained about 0.5sec over using
.has_key(word_with_unders). Using exceptions here lost me
7sec -- that should only be done when you are expecting very
few misses.
I iterate over range(len(word)) for each word. Since
words show up in the dictionary in sorted order, I tried caching the
range result (since I'd only need a handful of different
ranges for the entire run). No significant effect.
range vs. xrange: no difference.
Avoiding globals, by putting range=range in the method
signature. No significant gain.
"%s_%s" % (word[:i], word[i+1:]) vs. word[:i] +
word[i+1:]. Actually lost 2sec. Probably using %
instead of + for stings will only help when the strings are
long (these are very short).
The dictionary has 's at the end of some words. There
was no significant diffence between if "'" in word: word =
word.split("'")[0] and if word.endswith("'s"): word =
word[:-2]. Using line.strip() to get rid of
\n wasn't significantly faster than line[:-1]
either.
The other big optimization changed the nature of the program slightly.
If you only want to test the path between two words (say hat
and bee), then you only need to load words of that length
(length 3 in this case). You can ignore all the other words. By
doing this I could load the necessary parts of the dictionary in about
1.5sec, and use only 36Mb (it depends somewhat on what length words,
of course). As a baseline, I can read the entire word list into a
Python dictionary in about 0.8sec.
By using Psyco I can
cut the 8sec (with all the little optimizations) down to 3.5sec, using
psyco.full() It only brings the original down to 41sec (and
increases memory to 180Mb), which is still pretty slow. Pysco plus
turning GC off gives us 11sec, which is pretty good -- it does better
when we tell Psyco psyco.bind(load_words) (i.e., tell Psyco
to only try to optimize the one function: 11sec vs 15sec). In
contast, psyco.bind doesn't even effect our optimized
load_words, only psyco.full helps there.
I'm pretty happy with the finished program.
For the most part all the important, time-critical parts are already
in C -- they are dictionaries, sets, and lists. These data
structures make algorithm implementation in Python really easy and
scalable all at once. While you could write this in C for more
performance, it would take a lot of work just to get something
that worked, and a lot more work to match the speed of Python's
built-in data structures. Maybe a language like OCaml or Haskell could match (or beat)
Python's agility and performance. I'd be curious.