Let's look at a bunch of issues. For some, I have strawman answers.
For others, I'm still groping in the dark. (And I need a re-education
on how some of these things are solved in other languages!)
It would be a shame if declaring gcd() as taking int arguments would
mean that gcd(123456789012345, 23456789012355) would no longer work
(those are longs, not ints). This particular case can be solved by
using inheritance (let int derive from long, or vice versa), but there
are others that aren't solved so easily. For example:
from StringIO import StringIO
def foo(f: StringIO) -> str:
f.write("hello")
return f.getvalue()
f1 = StringIO("booh+")
print foo(f1) # prints "booh+hello"
What if a cStringIO instance were passed? There are technical reasons
why cStringIO can't or shouldn't inherit from StringIO, but more
importantly, requiring inheritance defeats Python's reliance on duck
typing.
Possible solution?: the compiler derives an inherited interface
(let's call this a duck type) for the argument from the fact that
the only two methods used are write() and getvalue(); the latter
returning a str. It makes sure that these methods are in fact defined
by the StringIO class, and then accepts other class instances that
implement the duck type.
But...: taking that to the extreme would not flag gcd("x", "y") as
a bug because the operations used (bool-testing, binary %) are
supported. Also, it doesn't help for the return type. What if I had
a UnicodeStringIO class whose getvalue() returned a Unicode string?
Container types offer lots of interesting problems. As an
introduction (not yet using containers), consider a function that
compute the smallest of two values:
def min(a, b):
if a < b:
return a
else:
return b
We would like to be able to add type annotations indicating that a and
b should have the same type and that the result also has that type.
Strawman syntax:
def min(a: T, b: T) -> T:
if a < b:
return a
else:
return b
As a (rather unpythonic) strawman, I'm using T and T0, T1, T2, etc. as
type variables. You can think of these as the free variables in an
equation of types; we're saying that the types of a, b and the
(unnamed) return value must all be the same. So min(1, 2) is valid
and returns an int; min("a", "b") is valid and returns a string.
What about min(1, 2.5)? That ought to be valid and return a float. I
guess this means that there should be some kind of typing hierarchy
that explains how the various numeric types are embedded in each
other. The VM already knows about these coercion rules, but the trick
is to build them into the type system. I think I would like to do
this using a mechanism separate from inheritance, since I really don't
think that it is a good idea to require that int is a subclass of
float and float a subclass of complex. But the mechanism should also
be open to user-defined types; there shouldn't be mechanisms in Python
that the user cannot extend (not many, anyway).
Now on to containers. Let's look at a different function that
computes the smallest element of a sequence:
def min(a):
it = iter(a)
x = it.next() # This raises StopIteration if a is empty
for y in it:
if y < x:
x = y
return x
How would we write the signature? Clearly a is any iterable; the
function return type is the iterable's element type. Strawman:
def min(a: iterable(T)) -> T:
it = iter(a)
x = it.next() # This raises StopIteration if a is empty
for y in it:
if y < x:
x = y
return x
I guess iterable would have to be a new built-in to represent the
concept "anything over which you can iterate". This includes lists
and tuples and strings, but also dictionaries, files, and in general
anything that defines __getitem__ or __iter__.
The Boost folks have a habit of Capitalizing these abstract types, so
it would be Iterable. That would perhaps also be a way out of the
dilemma of int vs. long: we could use Integer for idealized integers.
There are other numeric abstractions that we might want to define,
like Exact and Inexact, Rational, Real and Complex (and Quaternion?),
but I don't want to digress into number systems. Type systems are
hard enough without them.
Perhaps we could extend this convention to saying that whenever a
class name starts with a Capital letter it acts as a duck type, and
when it starts with a lower case letter it acts as a concrete type (so
only instances of the type and its subclasses are allowed).
I don't particularly like enforcing naming conventions based on case;
it's one thing to have an agreement amongst a group of Python
developers that Foo is an abstract class and foo is a concrete class,
but it's quit a different thing to have the compiler look at this too
for its type checking.
It would be good to have a set of notations both for the common
(abstract and concrete) container types, just so we can write more
examples. Here's a strawman.
list(T): a list whose elements are all T's
set(T): a set whose elements are all T's (set is a builtin in Python
2.4)
tuple(T0, T1, T2): a tuple of length three whose elements have the
specified types (if you want to use tuples of arbitrary length as
sequences, use Seqence)
dict(T1, T2): a dict whose key type is T1 and whose value type is T2
Iterable(T): an iterable whose elements are all T's
Sequence(T): a sequence whose elements are all T's
Mapping(T1, T2): a mapping whose key type is T1 and whose value type
is T2
union(T1, T2): either T1 or T2 (union is not really a type but an
operator, but I think it could just be a built-in)
Do we need to distinguish between mutable and immutable sets and
sequences?
We need a better name for type(None) than types.NoneType; perhaps void
will do?
A common pattern for optional values is to have union(T, void).
Perhaps we could add the notation optional(T) to mean just this?
I'm handwaving a bit about how the compiler knows all these built-in
names. I think it should assume that names of built-ins that aren't
redefined as globals (or imported, etc.) stand for their built-in
meaning and there's no surreptitious run-time redefinition going on.
So then it can know about int, and about len(x: sequence(T))->int, and
so on.
This is not about operator overloading, which is simple enough, but
about method signature overloading a la Java. While most overloading
is easily expressed using either default argument values or a simple
union type, there are some cases that defeat this approach.
For example, the signatures of the built-in min() and max() functions
cause some grief: they overload two quite different functions:
def min(a: iterable(T)) -> T:
...
def min(x: T, y: T, *rest: sequence(T)) -> T:
...
I don't know how to deal with this except by somehow making this kind
of overloading explicit. Decorators to the rescue perhaps:
@overloaded
def min(a: iterable(T)) -> T:
...
@overloaded
def min(x: T, y: T, *rest: sequence(T)) -> T:
...
The overloaded decorator could build a multimethod dispatched based on
the call signature. That's probably quicker than decoding the types
by hand from a varargs argument...