Summary
Here's a long-awaited update on where the Python 3000 project stands. We're looking at a modest two months of schedule slip, and many exciting new features. I'll be presenting this in person several times over the next two months.
The first time I came up with the idea of Python 3000 was probably at
a Python conference in the year 2000. The name was a take on Windows
2000. For a long time there wasn't much more than a list of regrets
and flaws that were impossible to fix without breaking backwards
compatibility. The idea was that Python 3000 would be the first
Python release to give up backwards compatibility in favor of making
it the best language going forward.
Maybe a year and a half ago (not coincidentally around the time I
started working for Google, which gave me more time for work on Python
than I had had in a long time) I decided it was time to start
designing and planning Python 3000 for real. Together with the Python
developer and user community I came up with a Plan. We created a new
series of PEPs (Python Enhancement Proposals) whose numbers started
with 3000. There was a PEP 3000 already, maintained by others in the
community, which was mostly a laundry list of ideas that had been
brought up as suitable for implementation in Python 3000. This was
renamed to PEP 3100; PEP 3000 became the document describing the
philosophy and schedule of the project.
Since then, we have, well, perhaps not moved mountains, but certainly
a lot of water has flowed under the bridge of the python-dev mailing
list, and later the separate python-3000 mailing list.
A schedule was first published around a year ago; we were aiming for a
first 3.0 alpha release by the end of the first half of 2007, with a
final 3.0 release a year later. (Python 3.0 will be the version when
it is released; "Python 3000" or "Py3k" is the project's code name.)
This schedule has slipped a bit; we're now looking at a first alpha by
the end of August, and the final release is moved up by the same
amount. (The schedule slip is largely due to the amount of work
resulting from the transition to all-Unicode text strings and mutable
raw bytes arrays. Perhaps I also haven't delegated enough of the work
to other developers; a mistake I am frantically trying to correct.)
There will be a "companion" release of Python 2.6, scheduled to be
released a few months before 3.0, with an alpha release about 4 months
before then (i.e., well after the first 3.0 alpha). The next two
sections explain its role. If you're not interested in living on the
bleeding edge, 2.6 is going to be next version of Python you'll be
using, and it will not be very different from 2.5.
Python 3.0 will break backwards compatibility. Totally. We're not
even aiming for a specific common subset. (Of course there will be a
common subset, probably quite large, but we're not aiming to make it
convenient or even possible to write significant programs in this
subset. It is merely the set of features that happen to be unchanged
from 2.6 to 3.0.)
Python 2.6, on the other hand, will maintain full backwards
compatibility with Python 2.5 (and previous versions to the extent
possible), but it will also support forward compatibility, in the
following ways:
Python 2.6 will support a "Py3k warnings mode" which will warn
dynamically (i.e. at runtime) about features that will stop working
in Python 3.0, e.g. assuming that range() returns a list.
Python 2.6 will contain backported versions of many Py3k features,
either enabled through __future__ statements or simply by allowing
old and new syntax to be used side-by-side (if the new syntax would
be a syntax error in 2.5).
Complementary to the forward compatibility features in 2.6, there
will be a separate source code conversion tool. This tool can do
a context-free source-to-source translation. As a (very simply)
example, it can translate apply(f,args) into f(*args).
However, the tool cannot do data flow analysis or type inferencing,
so it simply assumes that apply in this example refers to the
old built-in function.
The recommended development model for a project that needs to support
Python 2.6 and 3.0 simultaneously is as follows:
Start with excellent unit tests, ideally close to full coverage.
Port the project to Python 2.6.
Turn on the Py3k warnings mode.
Test and edit until no warnings remain.
Use the 2to3 tool to convert this source code to 3.0 syntax.
Do not manually edit the output!
Test the converted source code under 3.0.
If problems are found, make corrections to the 2.6 version
of the source code and go back to step 3.
When it's time to release, release separate 2.6 and 3.0 tarballs
(or whatever archive form you use for releases).
The conversion tool produces high-quality source code, that in many
cases is indistinguishable from manually converted code. Still, it is
strongly recommended not to start editing the 3.0 source code
until you are ready to reduce 2.6 support to pure maintenance
(i.e. the moment when you would normally move the 2.6 code to a
maintenance branch anyway).
Step (1) is expected to take the usual amount of effort of porting any
project to a new Python version. We're trying to make the transition
from 2.5 to 2.6 as smooth as possible.
If the conversion tool and the forward compatibility features in
Python 2.6 work out as expected, steps (2) through (6) should not take
much more effort than the typical transition from Python 2.x to
2.(x+1).
There are too many changes to list them all here; instead, I will
refer to the PEPs. However, I'd like to highlight a number of
features that I find to be significant or expect to be of particular
interest or controversial.
We're switching to a model known from Java: (immutable) text strings
are Unicode, and binary data is represented by a separate mutable
"bytes" data type. In addition, the parser will be more
Unicode-friendly: the default source encoding will be UTF-8, and
non-ASCII letters can be used in identifiers. There is some debate
still about normalization, specific alphabets, and whether we can
reasonably support right-to-left scripts. However, the standard
library will continue to use ASCII only for identifiers, and limit the
use of non-ASCII in comments and string literals to unit tests for
some of the Unicode features, and author names.
We will use "..." or '...' interchangeably for Unicode
literals, and b"..." or b'...' for bytes literals. For
example, b'abc' is equivalent to creating a bytes object using
the expression bytes([97,98,99]).
We are adopting a slightly different approach to codecs: while in
Python 2, codecs can accept either Unicode or 8-bits as input and
produce either as output, in Py3k, encoding is always a translation
from a Unicode (text) string to an array of bytes, and decoding always
goes the opposite direction. This means that we had to drop a few
codecs that don't fit in this model, for example rot13, base64 and bz2
(those conversions are still supported, just not through the
encode/decode API).
The I/O library is also changing in response to these changes. I
wanted to rewrite it anyway, to remove the dependency on the C stdio
library. The new distinction between bytes and text strings required
a (subtle) change in API, and the two projects were undertaken hand in
hand. In the new library, there is a clear distinction between binary
streams (opened with a mode like "rb" or "wb") and text streams
(opened with a mode not containing "b"). Text streams have a new
attribute, the encoding, which can be set explicitly when the stream
is opened; if no encoding is specified, a system-specific default is
used (which might use guessing when an existing file is being opened).
Read operations on binary streams return bytes arrays, while read
operations on text streams return (Unicode) text strings; and similar
for write operations. Writing a text string to a binary stream or a
bytes array to a text stream will raise an exception.
Otherwise, the API is kept pretty compatible. While there is still a
built-in open() function, the full definition of the new I/O
library is available from the new io module. This module also
contains abstract base classes (see below) for the various stream
types, a new implementation of StringIO, and a new, similar class
BytesIO, which is like StringIO but implements a binary stream, hence
reading and writing bytes arrays.
Two more I/O-related features: the venerable print statement now
becomes a print() function, and the quirky % string formatting
operator will be replaced with a new format() method on string
objects.
Turning print into a function usually makes some eyes roll.
However, there are several advantages: it's a lot easier to refactor
code using print() functions to use e.g. the logging package
instead; and the print syntax was always a bit controversial, with
its >>file and unique semantics for a trailing comma. Keyword
arguments take over these roles, and all is well.
Similarly, the new format() method avoids some of the pitfalls of
the old % operator, especially the surprising behavior of "%s"%x when x is a tuple, and the oft-lamented common mistake of
accidentally leaving off the final 's' in %(name)s. The new
format strings use {0},{1},{2},... to reference positional
arguments to the format() method, and {a},{b},... to
reference keyword arguments. Other features include {a.b.c} for
attribute references and even {a[b]} for mapping or sequence access.
Field lengths can be specified like this: {a:8}; this notation
also supports passing on other formatting options.
The format() method is extensible in a variety of dimensions: by
defining a __format__() special method, data types can override
how they are formatted, and how the formatting parameters are
interpreted; you can also create custom formatting classes, which can
be used e.g. to automatically provide local variables as parameters
to the formatting operations.
You might have guessed that "classic classes" finally bite the dust.
The built-in class object is the default base class for new
classes. This makes room for a variety of new features.
Class decorators. These work just like function decorators:
@art_deco
class C:
...
Function and method signatures may now be "annotated". The core
language assigns no meaning to these annotations (other than making
them available for introspection), but some standard library modules
may do so; for example, generic functions (see below) can use these.
The syntax is easy to read:
New metaclass syntax. Instead of setting a variable
__metaclass__ in the body of a class, you must now specify the
metaclass using a keyword parameter in the class heading, e.g.:
class C(bases, metaclass=MyMeta):
...
Custom class dictionaries. if the metaclass defines a
__prepare__() method, it will be called before entering the
class body, and whatever it returns will be used instead of a
standard dictionary as the namespace in which the class body is
executed. This can be used, amongst others, to implement a "struct"
type where the order in which elements are defined is significant.
You can specify the bases dynamically, e.g.:
bases = (B1, B2)
class C(*bases):
...
Other keyword parameters are also allowed in the class heading;
these are passed to the metaclass' __new__ method.
You can override the isinstance() and issubclass() tests, by
defining class methods named __instancecheck__() or
__subclasscheck__(), respectively. When these are defined,
isinstance(x,C) is equivalent to C.__instancecheck__(x),
and issubclass(D,C) to C.__subclasscheck__(D).
Voluntary Abstract Base Classes (ABCs). If you want to define a
class whose instances behaves like a mapping (for example), you can
voluntarily inherit from the class abc.Mapping. On the one
hand, this class provides useful mix-in behavior, replacing most of
the functionality of the old UserDict and DictMixin classes.
On the other hand, systematic use of such ABCs can help large
frameworks do the right thing with less guesswork: in Python 2, it's
not always easy to tell whether an object is supposed to be a
sequence or a mapping when it defines a __getitem__() method.
The following standard ABCs are provided: Hashable, Iterable,
Iterator, Sized, Container, Callable; Set, MutableSet; Mapping,
MutableMapping; Sequence, MutableSequence; Number, Complex, Real,
Rational, Integer. The io module also defines a number of ABCs,
so for the first time in Python's history we will have a
specification for the previously nebulous concept file-like.
The power of the ABC framework lies in the ability (borrowed from
Zope interfaces) to "register" a concrete class X as "virtually
inheriting from" an ABC Y, where X and Y are written by different
authors and appear in different packages. (To clarify, when virtual
inheritance is used, the mix-in behavior of class Y is not made
available to class X; the only effect is that issubclass(X,Y)
will return True.)
To support the definition of ABCs which requires that concrete
classes actually implement the full interface, the decorator
@abc.abstractmethod can be used to declare abstract methods
(only in classes whose metaclass is or derives from
abc.ABCMeta).
Generic Functions. The inclusion of this feature, described in PEP
3124, is somewhat uncertain, as work on the PEP seems to have slowed
down to a standstill. Hopefully the pace will pick up again. It
supports function dispatch based on the type of all the arguments,
rather than the more conventional dispatch based on the class of the
target object (self) only.
All exceptions must derive from BaseException and preferably
from Exception.
We're dropping StandardError.
Exceptions no longer act as sequences. Instead, they have an
attribute args which is the sequence of arguments passed to the
constructor.
The exceptE,e: syntax changes to exceptEase; this
avoids the occasional confusion by exceptE1,E2:.
The variable named after as in the except clause is
forcefully deleted upon exit from the except clause.
sys.exc_info() becomes redundant (or may disappear): instead,
e.__class__ is the exception type, and e.__traceback__ is
the traceback.
Additional optional attributes __context__ is set to the
"previous" exception when an exception occurs in an except or
finally clause; __cause__ can be set explicitly when re-raising
an exception, using raiseE1fromE2.
The old raise syntax variants raiseE,e and raiseE,e,tb
are gone.
Ordering comparisons (<, <=, >, >=) will raise TypeError
by default instead of returning arbitrary results. The default equality comparisons
(==, !=, for classes that don't override __eq__)
compare for object identity (is, isnot).
(The latter is unchanged from 2.x; comparisons between compatible
types in general don't change, only the default ordering based on memory address
is removed, as it caused irreproducible results.)
The nonlocal statement lets you assign to variables in outer (non-global)
scopes.
New super() call: Calling super() without arguments is equivalent to
super(<this_class>,<first_arg>). It roots around in the stack frame to
get the class from a special cell named __class__ (which you can also use
directly), and to get the first argument. __class__ is based on static,
textual inclusion of the method; it is filled in after the metaclass created
the class object (but before class decorators run). super() works in
regular methods as well as in class methods.
Set literals: {1,2,3} and even set comprehensions: {xforxinyifP(x)}.
Note that the empty set is set(), since {} is an empty dict!
reduce() is gone (moved to functools, really).
This doesn't mean I don't like higher-order functions;
it simply reflects that almost all code that uses
reduce() becomes more readable when rewritten using a plain old
for-loop. (Example.)
lambda, however, lives.
The backtick syntax, often hard to read, is gone (use repr()),
and so is the <> operator (use !=; it was too flagrant a
violation of TOOWTDI).
At the C level, there will be a new, much improved buffer API,
which will provide better integration with numpy. (PEP 3118)
I don't want to say too much about the changes to the standard
library, as this is a project that will only get under way for real
after 3.0a1 is released, and I will not personally be overseeing it
(the core language is all I can handle). It is clear already that
we're removing a lot of unsupported or simply outdated cruft
(e.g. many modules only applicable under SGI IRIX), and we're trying
to rename modules with CapWords names like StringIO or
UserDict, to conform with the PEP 8 naming standard for module
names (which requires a short all-lowercase word).
Did I mention that lambda lives? I still get the occasional
request to preserve it, so I figured I'd mention it twice. Don't
worry, that request has been granted for over a year now.
I live in Phoenix, and the 21st just happens to be my birthday! I'd love to hear you speak, but, pray tell, where *is* the Phoenix office? My Google-fu is weak...
I'm still a bit annoyed that reduce() is gone, but oh well, I'll just system() call Haskell code when I need folds.
On the other hand, I'd like to know if the old printf-format style is completely gone and replaced by format(), or if format() is merely an additional way to do it?
(replying to myself because I don't think I can edit the previous post): as a side note, why isn't reduce() merely moving to the functools module?
functools is a module of "Tools for working with functions and callable objects" after all, moving reduce there would remove it from the global namespace and still allow fold users to have it if they want/need it (plus it makes sense, folds + partial application make for great stuff)
reduce() is hard to use legibly and like sum() it is open to bad big-O solutions. It doesn't even save finger typing. I code golf occasionally (on codegolf.com) and I've never found a way to make code shorter with reduce() versus a loop.
> and like sum() it is open > to bad big-O solutions.
That's a completely different issue.
> It doesn't even save finger > typing.
Most fold users find it does, but more importantly it maps differently to the brain. When you're used to thinking recursively folding just makes sense in many situations. You're not and you prefer iteration, that's fine. Others don't, and like the security of most fold's immutable operations (when in regular iterations you have to mutate stuff at some point)
Ok, so class body __metaclass__ goes away. What about the module-level __metaclass__ variable? Do you view it as purely a feature to support the classic/new-style transition?
I personally found the avoidance of data corruption bugs due to the use of int(x, 0) when processing data files with leading zeroes a more convincing rationale for this change...
I'll also second the question about reduce() - are we ditching it entirely, or just moving it to functools?
I so look forward to Py3k! So many great decisions, though a few I'm still getting used to...
1) Making print and format functions is a very good idea. It should teach people that they should just use sys.stdout and sys.stderr directly instead of mucking with those values to get print to go out stderr by setting "sys.stdout = sys.__stderr__". Much improved. So I assume you intend "format('\{{1}, {a}\}', 'first', {'a':'second'})" to produce something like '{first, second}'? Is that the idea? Perhaps just the PEP number for this one so I can read it myself.
2) Generic Functions: Does this mean writing overloaded functions based on type, or are you going to allow C++ style template parameters using function decorations, e.g. @template<class typeA, class typeB> and then use partial template specialization to bind the types to specific function implementations. Again, I suppose I should read the corresponding PEP...
3) map, zip, filter returning iterators: makes me nervous but seems sound in principle. And if you need to map a list , a, to a list, for instance, you could always do 'list(map(f, a))', could you not? So sounds like a plan!
4) Reduce, I shall miss thee, but thine demise was foretold. Not 100% convinced that writing out in code is cleaner. After all, C++ has a a mutable form of reduce in its std algorithms and C++ is famous for going light on the library overkill (with the exception of std::basic_string).
5) How about '{,}' for an empty Set()? I guess it would be a pain to remember '{,}' for Sets and '{}' for Dicts but then again, as new developers come on board, they may not know the history of Python or why (from their point of view) '{}' is chosen arbitrarily to mean Dict. In fact, I would probably assume it meant Set since that is the simpler type and Sets, like Tuples, Strings, Bytes and Lists are all 1 item per element types, so why not '{}' assume the same, that is an empty Set? After all, Py3k is a redesign, so why not either eliminate '{}' all together or just say '{}' mean Set, where maybe an empty Set can be up-converted automatically to a Dict but if a Dict is explicitly desired, you must write 'Dict()'. Just a thought.
NOTE: Bruce reminds me that this means there is technically "No Rule 6."
7) So it has NOTHING to do with Mystery Science Theater? Riiiiight! ;)
8) Hey! We miss you on the East Coast. Gulf Coast Pycon was a blast, and Great Lakes Pycon is be windy swingin' (thanks AMK and Mr. Goodger) but it's years since Pycon DC (Mr. Holden: when is Pycon London?) and even if the Baltimore Sprint comes to be, we miss you here in NoVa Guido!
9) Yay! Lambda!
10) Call it like it is: ABCs are Java Interfaces (or C++ Pure Virtual Classes).
There is a reason they call you Benevolent Dictator For Life, sir! In the end, you are always right!! :)
Too bad in the function signatures you didn't include a way to indicate what exceptions can be thrown. A great many times you won't find that information in the documentation and you need to read the code.
"Generic Functions. The inclusion of this feature, described in PEP 3124, is somewhat uncertain, as work on the PEP seems to have slowed down to a standstill. Hopefully the pace will pick up again. It supports function dispatch based on the type of all the arguments, rather than the more conventional dispatch based on the class of the target object (self) only."
I think having both generic functions and member functions in a language is rather confusing since member functions behave just like generic functions that dispatch only on the first argument. I'd either leave out generic functions, or drop the member function syntax and use generic function syntax for everything.
Whatever you do, I'd drop the @overload keyword--generic functions are not the same as overloading.
Will there be a specially used main(argv) method of modules in Python 3000, which is only run if the given module is the main one? That would avoid the bizarre, repetitive, and pointless if __name__ == '__main__': main(sys.argv)
Flat View: This topic has 47 replies
on 4 pages
[
1234
|
»
]