As you may know, for some time I have been running a campaign against multiple inheritance and mixins - you may want to read the conclusion of the third article of the series Things to Know About Super and the first and the second article of the series Mixins considered harmful. Following that route, last week I decided to release a module I wrote this summer, the strait module, which implements traits for Python.

The implementation is inspired by the 2003 paper Traits - Composable Units of Behavior. Traits are simple since they cannot have common methods and the method resolution order is trivial. There is an implementation of the concept in the Smalltalk implementation Squeak.

The documentation of the strait module is intended for language designers, framework writers and advanced Python programmers. It actually was written for the guys of the python-dev list, as a companion to a thread about my articles on super. It is not intended for the average Joe programmer, and it is somewhat technical, focusing on the details on the Python implementation. On the other hand, knowing that alternatives to multiple inheritance and mixins exist in my opinion is good for everybody.

Thus, I have decided to supplement the documentation of the strait module with a few notes explaining what traits are, the differences with multiple inheritance and mixins and what we mean by method cooperation. The notes here are intended for any programmer with experience in OOP, and they are not Python specific at all.

Multiple inheritance and method cooperation

Multiple inheritance, mixins and traits are usually considered advanced techniques of object oriented programming, since the most popular languages (Java, C#, VisualBasic, PHP) do not support them, or support them in a poor way (C++). On the other hand, those techniques are pretty common in the coolest languages out there, such as Python (featuring multiple inheritance), Ruby (featuring mixins) and Scala (featuring "traits"). I am quoting the term "traits" when referred to Scala, since Scala traits are more similar to Python mixins than to Squeak traits. Actually, Scala traits can be composed when they override the same method and the order of the composition determines the resulting pattern of super calls: that means that Scala traits have basically all the complications of Python mixins, which I would rather avoid.

Multiple inheritance is the most general technique among the three cited before: mixins can be seen as a restricted form of multiple inheritance and traits as a restricted form of mixins. Multiple inheritance is available in various languages, such as C++, Common Lisp, Python, Eiffel, and others. In a multiple inheritance language, a class can have more than one parent and thus can inherit methods and attributes from more sources at the same time. Maintaining code taking advantages of (multiple) inheritance is nontrivial, since in order to understand how a class works, one needs to study all of its parents (and the parents of the parents, recursively).

That means that there is a strong coupling of the code: changing any method in any ancestors has an effect on the class. To some extent this is is inevitable, since the other face of code reuse if code coupling (you cannot have one without the other) and one has to cope with that. Also, you have the same problem even with single inheritance, when you have a deep hierarchy. However, multiple inheritances adds another level of complication.

For instance, the order of the parents is significant: a class C1 inheriting from P1 and P2 does not necessarily behave the same as a class C2 inheriting from P2 and P1 where the order of the parents is inverted. The reason is that for common methods, i.e. methods with the same name, the methods of P1 have the precedence over the methods of P2 for the class C1(P1, P2), but not for the class C2(P2, P1). Since the common methods are silently overridden and programmers are not quite good at remembering the ordering, that may give raise to subtle bugs.

The situation is worse if one looks at the higher order ancestors: the order of overriding (the so called MRO, Method Resolution Order) is definitely non trivial: I actually wrote a long essay on the subject, describing the Python MRO and I refer to that reference for the details. While that reference is Python specific, the concept of method resolution (also called linearization in the Lisp world) is general and applies to many languages, including Dylan and Common Lisp.

If you want to know more about the linearizations of Dylan and Common Lisp, you should look at this paper. On the other hand, if you are a reader of my Scheme series or a Scheme practitioner, I suggest you to read the paper Scheme with Classes, Mixins, and Traits, which describes the object system of PLT Scheme, which support both cooperative mixins and traits in the Squeak sense.

Scala does not support full multiple inheritance, but its traits are nearly as powerful (the only difference between a trait and a regular class is that the trait does not define a constructor) and nearly as complicated, therefore I would consider Scala in the same class with Python and Common Lisp. If you want to look at how Scala works, you may look the Scala Overview paper. Theoretically, the Python MRO is the best one, since it is monotonic, but in practice all MROs are quite complicated.

The point to notice is that the complication of the MRO is by design: languages with a non-trivial MRO where designed this way to make possible method cooperation via super calls. That means that if both parents P1 and P2 define a method m, a child class C can override it and still have access to the m methods of the parents via super: C.m will call first P1.m and then P2.m, if P1.m features a super call itself.

Of course, this is just one possible design: different languages may adopt different designs. For instance the Eiffel language implements multiple inheritance, but it raises an exception when two methods with the same name are present: the programmer is forced to specify an explicit renaming (this is basically what happens for traits).

Years ago, I thought such a design to be simplistic (even stupid) and very much inferior to the Python cooperative design: nowadays I have had more experience with real life large object oriented systems using multiple inheritance and I have come to appreciate "stupid" designs. Actually, nowadays I think Smalltalk made the right choice thirty years ago, deciding to not support multiple inheritance nor mixins.

Mixins and traits without multiple inheritance

In practice, the overriding problem is not very frequent (it is serious when it happens, but it rarely happens) since usually frameworks are designed to mix independent sets of functionality. Usually one does not need the full power of multiple inheritance: mixins or traits are powerful enough to implement most frameworks.

In a language with multiple inheritance it is natural to implement mixins as classes. However, this is not the only solution. In general, we can speak of mixin programming in any language where it is possible to inject methods in the namespace of a class, both statically before class creation or dynamically after class creation.

For instance, Ruby does not support multiple inheritance, bit it does support mixins since it is possible to include methods coming from a module:

class C_with_mixin < C:
   include M # M is a module

There is an advantage in this approach: modules have no parents and there is no concept of method resolution order, so it is much easier to figure out what a mixin does, as compared to figure out what a mixin implemented as a class in a multiple inheritance hierarchy does. On the other hand, there is no method cooperation in the sense of Python or Scala super or CLOS call-next-method. There is a limited cooperation between parent and sons only, since Ruby super (like Java super), is able to dispatch to the parent class only. This is not necessarily a bad thing, though.

Ruby mixins are much simpler than Scala traits or Python mixins, but they still suffer for the ordering problem: mixing the module M1 and the module M2 is different than mixing the module M2 and the module M1: if the modules contain methods with the same name, changing the composition order affects the resulting class.

Traits were invented just to solve this problem: common methods raise an error unless the programmer specifies the precedence explicitly, or she renames the methods. After that, traits commute. Traits are therefore the most explicit and safest technique, whereas multiple inheritance is the most fragile technique, with mixins in between.

A proper implementation of traits should also include introspection tools such that a class can be seen both as a flat collection of methods and as a composite entity (the original paper about traits explain this point pretty well). That should help with the namespace pollution problem by giving to the developer the ability to see the class as a composition of traits (one could argue that in Python pydoc allows you to see the origin of the methods as coming from parent classes, but that support is insufficient to manage situations with complicate inheritance hierarchies and lots of methods).

In Python you can also implement mixins without inheritance simply by dynamically adding methods to a class, starting from a method dictionary M:

class C_with_mixin(C):
   pass

for name in M: # M is a dictionary of methods
   setattr(C_with_mixin, name, M[name])

Implementing Ruby mixins in Python is therefore trivial, you can just read the methods from a module dictionary. Implementing traits is a bit less trivial, since you must check for common names and raise an error in that case. Moreover you must be careful with ordering issues: the traits paper says that methods coming from a trait must take the precedence over methods coming from the base class, but they must not take the precedence over methods defined in the class.

I took some liberty with my own implementation of traits, which was just inspired by the Squeak implementation, but it is not the same. In particular, I added some support for cooperation of traits, i.e. there is a kind of super, but its functionality is limited with respect to the regular super, and it is a bit more akward to use: this is on purpose, to discourage designs based on method cooperation, which I think are fragile and not to be recommended (again, see the third article on my series about super). Still, in the very few special case where one wants cooperation, that is possible indeed.

All those points are explained in the documentation of the strait module, so you should look there, if you are really interested in the subject. Here I will just add that I still prefer generic functions to traits. Nevertheless, traits may have a role if you want to follow the traditional route of having methods inside classes, and they are a smaller leap from traditional object oriented programming. Moreover, it is much easier to convert a pre-existing framework from using mixins to using traits than to convert it generic functions.

If you want to play with traits in Python, you are welcome to try the strait module. Enjoy!

Post scriptum. I realize that this post may be misinterpreted, so let me make clear a couple of points:

I am not asking for removing multiple inheritance in Python and replacing it with traits. However, I am saying to people writing new languages: think twice before adding multiple inheritance. Certainly it is more difficult to implement than traits; moreover, I am arguing that it makes life more difficult for yours users too.
I do not think traits are the best thing after sliced bread. They are a bit better than multiple inheritance, but I still recommend to keep things simple and to use both (single) inheritance and traits as little as possible.