Summary
This essay is intended for Python programmers wanting to understand the concept of cooperative inheritance and the usage of super. It does not require any previous reading. The target is Python 3.0, since it has a nicer syntax for super, even if most of what I say here can be backported down to Python 2.2.
Advertisement
Most languages supporting inheritance support cooperative inheritance,
i.e. there is a language-supported way for children methods
to dispatch to their parent method: this is usually done via a
super keyword. Things are easy when the language support single
inheritance only, since each class has a single parent and there is an
unique concept of super method. Things are difficult when the
language support multiple inheritance: in that case the programmer
has to understand the intricacies of the Method Resolution Order.
This paper is intended to be very practical, so let me start with an example.
Consider the following hierarchy (in Python 3):
class A(object):
def __init__(self):
print('A.__init__')
super().__init__()
class B(object):
def __init__(self):
print('B.__init__')
super().__init__()
class C(A, B):
def __init__(self):
print('C.__init__')
super().__init__()
What is the "superclass" of A? In other words, when I create an
instance of A, which method will be called by
super().__init__()? Notice that I am considering here generic
instances of A, not only direct instances: in particular, an
instance of C is also an instance of A and instantiating C
will call super().__init__() in A.__init__ at some point: the
tricky point is to understand which method will be called
for indirect instances of A.
In a single inheritance language there is an unique answer both for
direct and indirect instances (object is the super class of A
and object.__init__ is the method called by
super().__init__()). On the other hand, in a multiple inheritance
language there is no easy answer. It is better to say that there is no
super class and it is impossible to know which method will be called
by super().__init__() unless the subclass from wich super is
called is known. In particular, this is what happens when we instantiate C:
>>> c = C()
C.__init__
A.__init__
B.__init__
As you see the super call in C dispatches to A.__init__ and then
the super call there dispatches to B.__init__ which in turns dispatches to
object.__init__. The important point is that
the same super call can dispatch to different methods:
when super().__init__() is called directly by instantiating
A it dispatches to object.__init__ whereas when it is called indirectly
by instantiating C it dispatches to B.__init__. If somebody
extends the hierarchy, adds subclasses of A and instantiated them,
then the super call in A.__init__
can dispatch to an entirely different method: the super method call
depends on the instance I am starting from. The precise algorithm
specifying the order in which the methods are called is
called the Method Resolution Order algorithm, or MRO for short. It
is discussed in detail in an old essay I wrote years ago and
interested readers are referred to it (see the references below).
Here I will take the easy way and I will ask Python.
Given any class, it is possibly to extract its linearization, i.e. the
ordered list of its ancestors plus the class itself: the super call
follow such list to decide which is the right method to dispatch
to. For instance, if you are considering a direct instance of A,
object is the only class the super call can dispatch to:
A super call in C will look first at A, then at B and finally at
object. Finding out the linearization is non-trivial; just to give
an example suppose we add to our hierarchy three classes D, E and F
in this way:
>>> class D: pass
>>> class E(A, D): pass
>>> class F(E, C): pass
>>> for c in F.mro():
... print(c.__name__)
F
E
C
A
D
B
object
As you see, for an instance of F a super call in A.__init__
will dispatch at D.__init__ and not directly at B.__init__!
I have just shown that one cannot tell in advance
where the supercall will dispatch, unless one knows the whole hierarchy:
this is quite different from the single inheritance situation and it is
also very much error prone and brittle.
When you design a hierarchy you will expect for instance that
A.__init__ will call B.__init__, but adding classes (and such
classes may be added by a third party) may change the method chain. In this
case A.__init__ (when invoked by an F instance) will call
D.__init__. This is dangerous: for instance,
if the behavior of your code depends on the ordering of the
methods you may get in trouble. Things are worse if one of the methods
in the cooperative chain does not have a compatible signature, since the
chain will break.
This problem is not theoretical and it happens even in very trivial
hierarchies. For instance, here is an example of incompatible
signatures in the __init__ method (this problem
affects even Python 2.6, not only Python 3.X):
class X(object):
def __init__(self, a):
super().__init__()
class Y(object):
def __init__(self, a):
super().__init__()
class Z(X, Y):
def __init__(self, a):
super().__init__(a)
Here instantiating X and Y works fine, but as soon as you
introduce Z you get in trouble since super().__init__(a) in
Z.__init__ will call super().__init__() in X which in
turns will call Y.__init__ with no arguments, resulting in a
TypeError! In older Python versions (from 2.2 to 2.5) such
problem can be avoided by leveraging on the fact that
object.__init__ accepts any number of arguments (ignoring them), by
replacing super().__init__() with super().__init__(a). In Python
2.6+ instead there is no real solution for this problem, except avoiding
super in the constructor or avoiding multiple inheritance.
In general if you want to support multiple inheritance you should use
super only when the methods in a cooperative chain
have consistent signature: that means that you
will not use super in __init__ and __new__ since likely your
constructors will have custom arguments whereas object.__init__
and object.__new__ have no arguments. However, in practice, you may
inherits from third party classes which do not obey this rule, or
others could derive from your classes without following this rule and
breakage may occur. For instance, I have used super for years in my
__init__ methods and I never had problems because in older Python
versions object.__init__ accepted any number of arguments: but in Python 3
all that code is fragile under multiple inheritance. I am left with
two choices: removing super or telling people that
those classes are not intended to be used in multiple inheritance
situations, i.e. the constructors will break if they do that.
Nowadays I tend to favor the second choice.
Luckily, usually multiple inheritance is used with mixin classes, and mixins do
not have constructors, so that in practice the problem is mitigated.
Even if super has its shortcomings, there are meaningful use cases for
it, assuming you think multiple inheritance is a legitimate design technique.
For instance, if you use metaclasses and you want to support multiple
inheritance, you must use super in the __new__ and __init__
methods: there is no problem, since the constructor for
metaclasses has a fixed signature (name, bases, dictionary). But metaclasses
are extremely rare, so let me give a more meaningful example for an application
programmer where a design bases on cooperative
multiple inheritances could be reasonable.
Suppose you have a bunch of Manager classes which
share many common methods and which are intended to manage different resources,
such as databases, FTP sites, etc. To be concrete, suppose there are
two common methods: getinfolist which returns a list of strings
describing the managed resorce (containing infos such as the URI, the
tables in the database or the files in the site, etc.) and close
which closes the resource (the database connection or the FTP connection).
You can model the hierarchy with a Manager abstract base class
class Manager(object):
def close(self):
pass
def getinfolist(self):
return []
and two concrete classes DbManager and FtpManager:
Now suppose you need to manage both a database and an FTP site:
then you can define a MultiManager as follows:
class MultiManager(DbManager, FtpManager):
def __init__(self, dsn, url):
DbManager.__init__(dsn)
FtpManager.__init__(url)
Everything works: calling MultiManager.close will in turn call
DbManager.close and FtpManager.close. There is no risk of
running in trouble with the signature since the close and getinfolist
methods have all the same signature (actually they take no arguments at all).
Notice also that I did not use super in the constructor.
You see that super is essential in this design: without it,
only DbManager.close would be called and your FTP connection would leak.
The getinfolist method works similarly: forgetting super would
mean losing some information. An alternative not using super would require
defining an explicit method close in the MultiManager, calling
DbManager.close and FtpManager.close explicitly, and an explicit
method getinfolist calling `DbManager.getinfolist and
FtpManager.getinfolist:
This would be less elegant but probably clearer and safer so you can always
decide not to use super if you really hate it. However, if you have
N common methods, there is some boiler plate to write; moreover, every time
you add a Manager class you must add it to the N common methods, which
is ugly. Here N is just 2, so not using super may work well,
but in general it is clear that the cooperative approach is more effective.
Actually, I strongly believe (and always had) that super and the
MRO are the right way to do multiple inheritance: but I also believe
that multiple inheritance itself is wrong. For instance, in the
MultiManager example I would not use multiple
inheritance but composition and I would probably use a generalization
such as the following:
class MyMultiManager(Manager):
def __init__(self, *managers):
self.managers = managers
def close(self):
for mngr in self.managers:
mngr.close()
def getinfolist(self):
return sum(mngr.getinfolist() for mngr in self.managers)
There are languages that do not provide inheritance (even single
inheritance!) and are perfectly fine, so you should always question
if you should use inheritance or not. There are always many options
and the design space is rather large. Personally, I always use
super but I use single-inheritance only, so that my cooperative
hierarchies are trivial.
Deep down, super in Python 3 is the same as in Python 2.X.
However, on the surface - at the syntactic level, not at the semantic level -
there is a big difference: Python 3 super is smart enough to figure out
the class it is invoked from and the first argument of the containing
method. Actually it is so smart that it works also for inner classes
and even if the first argument is not called self.
In Python 2.X super is dumber and you must tell the class and the
argument explicitly: for instance our first example must be written
class A(object):
def __init__(self):
print('A.__init__')
super(A, self).__init__()
By the way, this syntax works both in Python 3 and in Python 2, this is
why I said that deep down super is the same. The new feature in
Python 3 is that there is a shortcut notation super() for
super(A,self). In Python 3 the (bytecode) compiler is smart enough
to recognize that the supercall is performed inside the class A so
that it inserts the reference to A automagically; moreover it inserts
the reference to the first argument of the current method too. Typically
the first argument of the current method is self, but it may be
cls or any identifier: super will work fine in any case.
Since super() knows the class it is invoked from and the class of
the original caller, it can walk the MRO correctly. Such information
is stored in the attributes .__thisclass__ and .__self_class__
and you may understand how it works from the following example:
class Mother(object):
def __init__(self):
sup = super()
print(sup.__thisclass__)
print(sup.__self_class__)
sup.__init__()
Here .__self__class__ is just the class of the first argument (self)
but this is not always the case. The exception is the case of classmethods and
staticmethods taking a class as first argument, such as __new__.
Specifically, super(cls,x) checks if x is an instance
of cls and then sets .__self_class__ to x.__class__; otherwise
(and that happens for classmethods and for __new__) it checks if x
is a subclass of cls and then sets .__self_class__ to x directly.
For instance, in the following example
class C0(object):
@classmethod
def c(cls):
print('called classmethod C0.c')
There is a lot of magic going on in Python 3 super, and even more.
For instance, this is a syntax that cannot work:
def __init__(self):
print('calling __init__')
super().__init__()
class C(object):
__init__ = __init__
if __name__ == '__main__':
c = C()
If you try to run this code you will get a
SystemError:super():__class__cellnotfound and the reason is
obvious: since the __init__ method is external to the class the
compiler cannot infer to which class it will be attached at runtime.
On the other hand, if you are completely explicit and you use the full
syntax, by writing the external method as
everything will work because you are explicitly telling than the method
will be attached to the class C.
I will close this section by noticing a wart of super in Python 3,
pointed out by Armin Ronacher and others: the fact that super
should be a keyword but it is not. Therefore horrors like the
following are possible:
def super():
print("I am evil, you are NOT calling the supermethod!")
class C(object):
def __init__(self):
super().__init__()
if __name__ == '__main__':
c = C() # prints "I am evil, you are NOT calling the supermethod!"
DON'T DO THAT! Here the called __init__ is the __init__ method
of the object None!!
Of course, only an evil programmer would shadow super on purpose,
but that may happen accidentally. Consider for instance this use case:
you are refactoring an old code base written before the existence of
super and using frommodimport* (this is ugly but we know
that there are code bases written this way), with mod defining a
function super which has nothing to do with the super
builtin. If in this code you replace Base.method(self,*args) with
super().method(*args) you will introduce a bug. This is not common
(it never happened to me), but still it is bug that could not happen if
super were a keyword.
Moreover, super is special and it will not work if
you change its name as in this example:
# from http://lucumr.pocoo.org/2010/1/7/pros-and-cons-about-python-3
_super = super
class Foo(Bar):
def foo(self):
_super().foo()
Here the bytecode compiler will not treat specially _super, only
super. It is unfortunate that we missed the opportunity to make super
a keyword in Python 3, without good reasons (Python 3 was expected
to break compatibility anyway).
There is plenty of material about super and multiple inheritance. You
should probably start from the MRO paper, then read Super
considered harmful by James Knight. A lot of the issues with
super, especially in old versions of Python are covered in Things
to know about super. I did spent some time thinking about ways to
avoid multiple inheritance; you may be interested in reading my series
Mixins considered harmful.