Summary
Python's BDFL returns from the Python community conference
in Washington, DC with a head full of ideas and plans,
though not necessarily time to work on them yet.
Advertisement
The Python conference is a vague memory already: right afterwards I
went to Oxford, England to attend a C/C++ conference run by the
British Association for C and C++ Users, which since last year has
sprouted a Python track. Notes from that (equally exciting) event
will be a separate blog entry.
PyCon DC 2003 (http://www.python.org/pycon/)
was the first US-based Python community conference in a long time.
With about 240 people registered I'd call it a smashing success; I
think those 240 attendees would agree.
Pre-conference Sprints
Before the conference proper, we had a two-day coding sprint.
What's a coding sprint? It's an event loosely inspired by extreme
programming, where a number of developers get together for a few days
of intense pair programming on a common project. The first time I
heard of sprints was from Zope Corporation's Jim Fulton, who's been
using sprints successfully for the Zope 3 project; maybe he invented
the word. Coding sprint certainly sounds better than coding marathon!
On this occasion, there were at least three separate groups
sprinting in the same space (a classroom at George Washington
University which we were renting for the conference): Jim Fulton was
leading a Zope 3 sprint, there were a number of Twisted programmers
sprinting on Twisted projects, and in the back of the room we were
having a core Python sprint.
The Python sprint quickly separated out in three groups of two to
three coders each: one group, lead by Jeremy Hylton, worked on the new
bytecode compiler which is being developed in a CVS branch; I was
leading the two other groups, which focused on Python speedups.
(There were many other ideas for tasks to sprint on, but not enough
time.)
The first speedup plan, proposed by Ka-Ping Yee and implemented by
him and Aahz (that's really his whole name!), was a scheme to cache
the lookup of object attributes. Python has extremely dynamic rules
for looking attributes: an instance attribute is first searched in the
instance dictionary, for instance variables, then in the class, for
methods and class variables, and finally in the successive base
classes, for inherited methods and class variables. In other
languages, this lookup is usually done at compile time, but Python
does it at run-time, using a very efficient dictionary (hash table)
implementation. Nevertheless, finding a method defined in the third
base class costs three failing lookups and one successful one. We've
got to be able to to better, and this is a very common operation in
Python, so a speedup here might cause measurable speedup for all
Python programs.
Ping's plan was to cache the dictionary where the lookup was
successful, thereby reducing the number of lookups to exactly two: one
in the cache, and another one in the directory indicated by the cache.
This seems an obvious optimization, but wasn't done earlier because
there are situations where the cache must be invalidated because one
of the base classes is modified. Part of the project was to do the
invalidation right, and this could only be done with new
infrastructure added in Python 2.2.
We implemented the whole scheme successfully, but in the end ran
into a snag: there were some common cases where the old scheme did
only one lookup, and there the new scheme was slower than the
old scheme! We tried various refinements, but in the end we didn't
shave off enough to call it an overall win. The code is checked in on
a CVS branch, though, and I'm sure we'll be getting back to it later.
The other speedup team, consisting of Thomas Wouters and Brett
Cannon, was tackling the issue of speeding up method calls. When
Python encounters an expression of the form x.meth(args), the bytecode
compiler first spits out code to construct temporary object
representing the "bound method" x.meth, after which it produces code
to load the argument list and call the bound method. These are very
powerful semantics: a bound method can also be stored in a variable or
data structure, and can be used as a callback. Other languages call
this "closures". Python unifies closures, plain functions, and a few
other things, including class constructors, as "callables". But
method calls are very common, and the overhead of creating the bound
method object which is thrown away immediately after the call is quite
measurable.
So Thomas and Brett set out to introduce a new opcode which
implements the method call operation without creating the intermediate
bound method object. There were numerous challenges on the way to
success, such as how to recognize this exact situation in the parser,
and how to implement an opcode taking three arguments when the
bytecode interpreter only supports opcodes with zero or one argument.
But the real challenge was how to quickly decide at run-time
whether this was in fact a method call or not: syntactically,
instance.method(args) looks the same as module.function(args), and the
bytecode compiler doesn't know the type of x in x.attr(args), so it
will generate the new opcode for all expressions of this form,
regardless of whether x is a class instance. Therefore, the opcode
has to deal correctly with method calls as well as with all other
kinds of calls. Fortunately, the slight overhead of the required
generality is offset by the need to decode only one opcode instead of
two, and in the end we measured a decent speedup (in the order of 5%
for a certain benchmark, if I recall correctly).
Despite this clear success, we didn't check the code in yet. There
are really two cases that need to be sped up: classic classes and
new-style classes (the new class implementation introduced in Python
2.2, which will coexist with the original class implementation until
Python 3.0 is released). Thomas and Breatt only had time to implement
their code for classic classes. The code was parked on the
SourceForge patch manager until someone has time to complete it.
The Conference Proper
Since my sprint diary ended up much longer than planned, I'll have
to write down more extensive conference notes later. For now, some
highlights:
Paul Graham's keynote on the "100-year language" was entertaining,
although I wished he'd taken the time to say a bit more about Python,
like last year's keynote speaker (Andrew Koenig of C++ fame, who's
become quite the Python evangelist). It's been reviewed already in Ziggy's blog.
Ziggy is a Perl developer, but (a) he's got an open mind, and (b) he's
an experienced conference organizer who helped us find this excellent
venue.
An excellent idea was "open space", suggested and organized by Bob
Payne. This is not quite the same as BOFs, although it is somewhat
similar. The venue really helped, by accidentally setting up the
grand ballroom with circular dinner tables for the lunch arrangement.
Many smaller groups could have discussions or small presentations in
parallel that way.
The Python Software
Foundation (PSF, also financially responsible for the conference)
had its annual member meeting. This was a great success; more than
half of all members were present in person, and half of the others had
sent in their proxy form for the various votes. There was lively
discussion, and after the meeting we all went out for dinner.
Looking at the schedule, I realize that I hardly went to any of the
scheduled presentations! I spent almost all my time talking to
various people about their Python issues, having my picture
taken with attendees, and in an audio interview with Bruce Eckel.
Well, so it goes.
I have kept my distance from python for some time. The reason is basically that I dislike its use of spacing for indentation, instead of the more conventional bracketing with keywords or tokens.
My reasoning is pretty petty. My opinion though is that I'd have a hard time switching back and forth and that I'd always insert bracketing stuff when I didn't need it in python. But also, I already have several interpretive languages and JIT'd languages at my disposal.
Two questions:
1. What makes you feel bracketing is unnecesary (or why does python just use indentation. 2. What would be your idea of the driving reason why Python would provide a better solution to small problems then some of the conventional UNIX tools such as awk, sed and cut in a shell script. And, what about large applications. How does python scale to really huge applications that might need to run in highly available environments?
Okay and perhaps a 3rd...
3. What's going to make python keep going at its amazing pace of adoption?
> 2. What would be your idea of the driving reason why > Python would provide a better solution to small > problems then some of the conventional UNIX tools > ols such > as awk, sed and cut in a shell script. And, what > hat about > large applications. How does python scale to really > huge applications that might need to run in highly > available environments? > Typically, I use Python for small tasks, Java for large ones. I haven't tried using Python for a large task, but I did ask Guido a question similar to yours in this interview:
> 1. What makes you feel bracketing is unnecesary (or why > does python just use indentation.
Python is about readability for humans. When skimming code I usually rely on the indentation, not on the braces; making the indentation define the grouping prevents overlooking grouping bugs like
if (condition) a = 12; b = 42;
It also prevents holy wars on the one right bracing style. Finally, there's this quote from Don Knuth: [blockquote] We will perhaps eventually be writing only small modules which are identified by name as they are used to build larger ones, so that devices like indentation, rather than delimiters, might become feasible for expressing local structure in the source language.
--Donald E. Knuth, "Structured Programming with goto Statements", Computing Surveys, Vol 6 No 4, Dec. 1974 [/blockquote]
> 2. What would be your idea of the driving reason why > Python would provide a better solution to small > problems then some of the conventional UNIX tools > such as awk, sed and cut in a shell script.
Uniformity rather than a mishmash of tools each with their different syntax, limitations, and conventions. A Python is more maintainable than a solution built out of many little pieces.
> And, what about > large applications. How does python scale to really > huge applications that might need to run in highly > available environments?
Have you heard of Zope? It's a content management solution for large websites, and all written in Python. Works very well!
> Okay and perhaps a 3rd... > > 3. What's going to make python keep going at its amazing > pace of adoption?
> It also prevents holy wars on the one right bracing style. > Two years ago Matt Gerrans and I were sitting at our computers starting a Python project we were going to work on together. Matt and I had previously argued about where to put the open curly brace in Java code. We both agreed that it was nice to not have to argue about where to put that open curly brace in Python. We then spent the next half hour arguing about whether to indent 3 or 4 spaces in shared Python code.
From day one of writing C code that someone else had to edit, I have just used tabs. I used the vi(1) editor then, and could just use ":set ts=4 sw=4" and not have to worry about tabs/spacing. I just used tabs. Now, that is the driving factor over whether an editor is acceptable to me. If I can control tab expansion, and if it doesn't have a line shifting function, it's just not usable to me.
I'd highly recommend pushing the tab key once instead of the space bar 4 times to anyone who wants to work faster :-)
Oops, I should have said, If I can't control tab expansion, or I can't shift lines by the indentation level indicated by tab expansion, the editor is not acceptable to me.
> > 2. What would be your idea of the driving reason why > > Python would provide a better solution to small > > problems then some of the conventional UNIX tools > > such as awk, sed and cut in a shell script. > > Uniformity rather than a mishmash of tools each with their > different syntax, limitations, and conventions. A Python > is more maintainable than a solution built out of many > little pieces.
I guess I consider the shell pipeline syntax to be similar in nature to expressions in any language. The arguments to the commands are pretty much similar to the signatures on functions/methods. Once you learn them, it is easy to regurgitate them at will :-)
More than one person, including myself, has been guilty of doing something different, just because the method presented seemed unfamiliar or ineffective.
The Shell has been around for going on 30 years now, and nothing has really changed about the basics. KSH, CSH, TCSH, BASH etc have all tried to do it different, or better. KSH added functions and $(( )) and $() and other notations to let `expr args` not have to be used so often to fork a process to evaluate expression. CSH's initial difference was history references. There are lots of other little added things for interactive use.
But, the power of the shell is really amazing when everything is a string. Since the shell was initially designed to be used in the text formatting environment that the initial PDP UNIX was structured around, that made a lot of sense. As times change, and a wide range of different problems are being solved by scripting, new, less fragile tools are necessary (see my blog about my language I did http://www.artima.com/weblogs/viewpost.jsp?thread=4350).
I think that it is interesting that so many languages that do many similar things keep falling out of the woodwork, so to speak. Objects have become very popular, and dynamic attributes (late binding) of languages provide some very powerful tools.
One big problem with "the shell" is that it is not "the" shell. There are many shells. Python works as well on Windows as it does on Unix or Linux. The Windows shell, however, is much inferior (or at least significantly different) is this respect and using something like cygwin is impractical and clunky.
It's funny. I just can't understand why some people are aghast that Python doesn't use opening and closing braces for scopes. When I first saw this I thought it was brilliant.
I have always thought that the computer should do as much of the busywork as possible. I have always hated it when the compiler says something stupid like "missing ; on line 44" or some cryptic message on line 300 that is a result of a missing } on line 66. Usually when I get the first one, I mumble to myself, "well then put one there, you idiotic compiler!" Beginners are constantly befuddled about where to put semicolons and braces and where not to. This was particularly bad with Pascal and C only improved it a little.
I have seen enough horribly formatted C and C++ code to know that if it is meaningless to the compiler than it will be meaningless to some people. If indentation should be used to show intent, why not have the compiler use it for its intended purpose? And why force the humans to type in superfluous symbols, when the compiler can be smarter? The only arguments I can see for curlies instead of indentation are a) It is easier for the compiler writer to parse, and b) to provide a way for people to write poorly indented code.
By the way, I despise tab characters for a number of reasons, but modern editors let you hit the tab key to insert a tab or a (configurable) number of spaces, so that is kind of a moot point, on the entry side at least.
I think the argument for using a unified "Swiss Army Chainsaw" [TM -- Perl] over a combination of tiny utilities and pipe joiners, is well advanced by the Perl community. Then the logical next question is why might one prefer Python over Perl for similar tasks.
In many ways its a generational thing: if you grew up on the UNIX command line and became productive with sed and awk etc. in the context of shell scripts, or if you learned Perl really well, then you're set. You know how to get the job done and you feel productive.
But if you're new to the game, as many are (thanks to their being born in the 1980s, for example), then it's more a question of finding a curriculum that'll get you where you want to be with a minimum of fuss. Sitting down with a sed and awk book just may not be where you want to start.
And yes, the *NIX platform isn't the only one worth targeting. Python neatly accommodates differing file path delimiters in the os module, for example. os.sep returns '/' if you're on a posix system but '\' if you're on Win32. So if you're writing some simple configurator that you'd like to use across platforms, you have this layer to build on -- makes code more readable, easier to maintain. os.name tells you which platform you're running on.
As for using indentation for code blocks, that forces a readable and consistent style (think of it as compile-time style checking) and reduces the number of tokens in the syntax i.e. it's clean. An alternative to braces used in many languages is a lot more keywords, like if/endif, do/enddo -- but that just clutters up the vocabulary (likewise the shell practice of case/esac if/fi and so on).