Artima Weblogs |
Guido van van Rossum's Weblog |
Discuss |
Email |
Print |
Bloggers |
Previous |
Next
|
Sponsored Link •
|
Summary
Here's a long-awaited update on where the Python 3000 project stands. We're looking at a modest two months of schedule slip, and many exciting new features. I'll be presenting this in person several times over the next two months.
Advertisement
|
The first time I came up with the idea of Python 3000 was probably at a Python conference in the year 2000. The name was a take on Windows 2000. For a long time there wasn't much more than a list of regrets and flaws that were impossible to fix without breaking backwards compatibility. The idea was that Python 3000 would be the first Python release to give up backwards compatibility in favor of making it the best language going forward.
Maybe a year and a half ago (not coincidentally around the time I started working for Google, which gave me more time for work on Python than I had had in a long time) I decided it was time to start designing and planning Python 3000 for real. Together with the Python developer and user community I came up with a Plan. We created a new series of PEPs (Python Enhancement Proposals) whose numbers started with 3000. There was a PEP 3000 already, maintained by others in the community, which was mostly a laundry list of ideas that had been brought up as suitable for implementation in Python 3000. This was renamed to PEP 3100; PEP 3000 became the document describing the philosophy and schedule of the project.
Since then, we have, well, perhaps not moved mountains, but certainly a lot of water has flowed under the bridge of the python-dev mailing list, and later the separate python-3000 mailing list.
A schedule was first published around a year ago; we were aiming for a first 3.0 alpha release by the end of the first half of 2007, with a final 3.0 release a year later. (Python 3.0 will be the version when it is released; "Python 3000" or "Py3k" is the project's code name.)
This schedule has slipped a bit; we're now looking at a first alpha by the end of August, and the final release is moved up by the same amount. (The schedule slip is largely due to the amount of work resulting from the transition to all-Unicode text strings and mutable raw bytes arrays. Perhaps I also haven't delegated enough of the work to other developers; a mistake I am frantically trying to correct.)
There will be a "companion" release of Python 2.6, scheduled to be released a few months before 3.0, with an alpha release about 4 months before then (i.e., well after the first 3.0 alpha). The next two sections explain its role. If you're not interested in living on the bleeding edge, 2.6 is going to be next version of Python you'll be using, and it will not be very different from 2.5.
Python 3.0 will break backwards compatibility. Totally. We're not even aiming for a specific common subset. (Of course there will be a common subset, probably quite large, but we're not aiming to make it convenient or even possible to write significant programs in this subset. It is merely the set of features that happen to be unchanged from 2.6 to 3.0.)
Python 2.6, on the other hand, will maintain full backwards compatibility with Python 2.5 (and previous versions to the extent possible), but it will also support forward compatibility, in the following ways:
The recommended development model for a project that needs to support Python 2.6 and 3.0 simultaneously is as follows:
The conversion tool produces high-quality source code, that in many cases is indistinguishable from manually converted code. Still, it is strongly recommended not to start editing the 3.0 source code until you are ready to reduce 2.6 support to pure maintenance (i.e. the moment when you would normally move the 2.6 code to a maintenance branch anyway).
Step (1) is expected to take the usual amount of effort of porting any project to a new Python version. We're trying to make the transition from 2.5 to 2.6 as smooth as possible.
If the conversion tool and the forward compatibility features in Python 2.6 work out as expected, steps (2) through (6) should not take much more effort than the typical transition from Python 2.x to 2.(x+1).
There are too many changes to list them all here; instead, I will refer to the PEPs. However, I'd like to highlight a number of features that I find to be significant or expect to be of particular interest or controversial.
We're switching to a model known from Java: (immutable) text strings are Unicode, and binary data is represented by a separate mutable "bytes" data type. In addition, the parser will be more Unicode-friendly: the default source encoding will be UTF-8, and non-ASCII letters can be used in identifiers. There is some debate still about normalization, specific alphabets, and whether we can reasonably support right-to-left scripts. However, the standard library will continue to use ASCII only for identifiers, and limit the use of non-ASCII in comments and string literals to unit tests for some of the Unicode features, and author names.
We will use "..." or '...' interchangeably for Unicode literals, and b"..." or b'...' for bytes literals. For example, b'abc' is equivalent to creating a bytes object using the expression bytes([97, 98, 99]).
We are adopting a slightly different approach to codecs: while in Python 2, codecs can accept either Unicode or 8-bits as input and produce either as output, in Py3k, encoding is always a translation from a Unicode (text) string to an array of bytes, and decoding always goes the opposite direction. This means that we had to drop a few codecs that don't fit in this model, for example rot13, base64 and bz2 (those conversions are still supported, just not through the encode/decode API).
The I/O library is also changing in response to these changes. I wanted to rewrite it anyway, to remove the dependency on the C stdio library. The new distinction between bytes and text strings required a (subtle) change in API, and the two projects were undertaken hand in hand. In the new library, there is a clear distinction between binary streams (opened with a mode like "rb" or "wb") and text streams (opened with a mode not containing "b"). Text streams have a new attribute, the encoding, which can be set explicitly when the stream is opened; if no encoding is specified, a system-specific default is used (which might use guessing when an existing file is being opened).
Read operations on binary streams return bytes arrays, while read operations on text streams return (Unicode) text strings; and similar for write operations. Writing a text string to a binary stream or a bytes array to a text stream will raise an exception.
Otherwise, the API is kept pretty compatible. While there is still a built-in open() function, the full definition of the new I/O library is available from the new io module. This module also contains abstract base classes (see below) for the various stream types, a new implementation of StringIO, and a new, similar class BytesIO, which is like StringIO but implements a binary stream, hence reading and writing bytes arrays.
Two more I/O-related features: the venerable print statement now becomes a print() function, and the quirky % string formatting operator will be replaced with a new format() method on string objects.
Turning print into a function usually makes some eyes roll. However, there are several advantages: it's a lot easier to refactor code using print() functions to use e.g. the logging package instead; and the print syntax was always a bit controversial, with its >>file and unique semantics for a trailing comma. Keyword arguments take over these roles, and all is well.
Similarly, the new format() method avoids some of the pitfalls of the old % operator, especially the surprising behavior of "%s" % x when x is a tuple, and the oft-lamented common mistake of accidentally leaving off the final 's' in %(name)s. The new format strings use {0}, {1}, {2}, ... to reference positional arguments to the format() method, and {a}, {b}, ... to reference keyword arguments. Other features include {a.b.c} for attribute references and even {a[b]} for mapping or sequence access. Field lengths can be specified like this: {a:8}; this notation also supports passing on other formatting options.
The format() method is extensible in a variety of dimensions: by defining a __format__() special method, data types can override how they are formatted, and how the formatting parameters are interpreted; you can also create custom formatting classes, which can be used e.g. to automatically provide local variables as parameters to the formatting operations.
You might have guessed that "classic classes" finally bite the dust. The built-in class object is the default base class for new classes. This makes room for a variety of new features.
Class decorators. These work just like function decorators:
@art_deco class C: ...
Function and method signatures may now be "annotated". The core language assigns no meaning to these annotations (other than making them available for introspection), but some standard library modules may do so; for example, generic functions (see below) can use these. The syntax is easy to read:
def foobar(a: Integer, b: Sequence) -> String: ...
New metaclass syntax. Instead of setting a variable __metaclass__ in the body of a class, you must now specify the metaclass using a keyword parameter in the class heading, e.g.:
class C(bases, metaclass=MyMeta): ...
Custom class dictionaries. if the metaclass defines a __prepare__() method, it will be called before entering the class body, and whatever it returns will be used instead of a standard dictionary as the namespace in which the class body is executed. This can be used, amongst others, to implement a "struct" type where the order in which elements are defined is significant.
You can specify the bases dynamically, e.g.:
bases = (B1, B2) class C(*bases): ...
Other keyword parameters are also allowed in the class heading; these are passed to the metaclass' __new__ method.
You can override the isinstance() and issubclass() tests, by defining class methods named __instancecheck__() or __subclasscheck__(), respectively. When these are defined, isinstance(x, C) is equivalent to C.__instancecheck__(x), and issubclass(D, C) to C.__subclasscheck__(D).
Voluntary Abstract Base Classes (ABCs). If you want to define a class whose instances behaves like a mapping (for example), you can voluntarily inherit from the class abc.Mapping. On the one hand, this class provides useful mix-in behavior, replacing most of the functionality of the old UserDict and DictMixin classes. On the other hand, systematic use of such ABCs can help large frameworks do the right thing with less guesswork: in Python 2, it's not always easy to tell whether an object is supposed to be a sequence or a mapping when it defines a __getitem__() method. The following standard ABCs are provided: Hashable, Iterable, Iterator, Sized, Container, Callable; Set, MutableSet; Mapping, MutableMapping; Sequence, MutableSequence; Number, Complex, Real, Rational, Integer. The io module also defines a number of ABCs, so for the first time in Python's history we will have a specification for the previously nebulous concept file-like. The power of the ABC framework lies in the ability (borrowed from Zope interfaces) to "register" a concrete class X as "virtually inheriting from" an ABC Y, where X and Y are written by different authors and appear in different packages. (To clarify, when virtual inheritance is used, the mix-in behavior of class Y is not made available to class X; the only effect is that issubclass(X, Y) will return True.)
To support the definition of ABCs which requires that concrete classes actually implement the full interface, the decorator @abc.abstractmethod can be used to declare abstract methods (only in classes whose metaclass is or derives from abc.ABCMeta).
Generic Functions. The inclusion of this feature, described in PEP 3124, is somewhat uncertain, as work on the PEP seems to have slowed down to a standstill. Hopefully the pace will pick up again. It supports function dispatch based on the type of all the arguments, rather than the more conventional dispatch based on the class of the target object (self) only.
Just the highlights.
I don't want to say too much about the changes to the standard library, as this is a project that will only get under way for real after 3.0a1 is released, and I will not personally be overseeing it (the core language is all I can handle). It is clear already that we're removing a lot of unsupported or simply outdated cruft (e.g. many modules only applicable under SGI IRIX), and we're trying to rename modules with CapWords names like StringIO or UserDict, to conform with the PEP 8 naming standard for module names (which requires a short all-lowercase word).
Did I mention that lambda lives? I still get the occasional request to preserve it, so I figured I'd mention it twice. Don't worry, that request has been granted for over a year now.
I'll be presenting (or have presented) this material in person at several events:
The slides from OSCON are up: http://conferences.oreillynet.com/presentations/os2007/os_vanrossum.ppt (earlier versions were nearly identical).
Have an opinion? Readers have already posted 47 comments about this weblog entry. Why not add yours?
If you'd like to be notified whenever Guido van van Rossum adds a new entry to his weblog, subscribe to his RSS feed.
Guido van Rossum is the creator of Python, one of the major programming languages on and off the web. The Python community refers to him as the BDFL (Benevolent Dictator For Life), a title straight from a Monty Python skit. He moved from the Netherlands to the USA in 1995, where he met his wife. Until July 2003 they lived in the northern Virginia suburbs of Washington, DC with their son Orlijn, who was born in 2001. They then moved to Silicon Valley where Guido now works for Google (spending 50% of his time on Python!). |
Sponsored Links
|