"Pure Virtual Function Called": An Explanation

by Paul S. R. Chisholm

February 26, 2007

Summary

"Pure virtual function called" is the dying message of the occasional crashed C++ program. What does it mean? You can find a couple of simple, well-documented explanations out there that apply to problems easy to diagnose during postmortem debugging. There is also another rather subtle bug that generates the same message. If you have a mysterious crash associated with that message, it might well mean your program went indirect on a dangling pointer. This article covers all these explanations.

Object-Oriented C++: The Programmer's View

(If you know what pure virtual functions and abstract classes are, you can skip this section.)

In C++, virtual functions let instances of related classes have different behavior at run time (aka, runtime polymorphism) :

class Shape {
public:
        virtual double area() const;
        double value() const;
        // Meyers 3rd Item 7:
        virtual ~Shape();
protected:
        Shape(double valuePerSquareUnit);
private:
        double valuePerSquareUnit_;
};

class Rectangle : public Shape {
public:
        Rectangle(double width, double height, double valuePerSquareUnit);
        virtual double area() const;
        // Meyers 3rd Item 7:
        virtual ~Rectangle();
// ...
};

class Circle : public Shape {
public:
        Circle(double radius, double valuePerSquareUnit);
        virtual double area() const;
        // Meyers 3rd Item 7:
        virtual ~Circle();
// ...
};

double
Shape::value() const
{
        // Area is computed differently, depending
        // on what kind of shape the object is:
        return valuePerSquareUnit_ * area();
}

(The comments before the destructors refer to Item 7 in the third edition of Scott Meyers's Effective C++: "Declare destructors virtual in polymorphic base classes." This code follows a convention used on several projects, where references like this are put in the code, serving as reminders to maintainers and reviewers. To some people, the point is obvious and the reminder is distracting; but one person's distraction is another person's helpful hint, and programmers in a hurry often forget what should be "obvious.")

In C++, a function's interface is specified by declaring the function. Member functions are declared in the class definition. A function's implementation is specified by defining the function. Derived classes can redefine a function, specifying an implementation particular to that derived class (and classes derived from it). When a virtual function is called, the implementation is chosen based not on the static type of the pointer or reference, but on the type of the object being pointed to, which can vary at run time:

print(shape->area());  // Might invoke Circle::area() or Rectangle::area().

A pure virtual function is declared, but not necessarily defined, by a base class. A class with a pure virtual function is "abstract" (as opposed to "concrete"), in that it's not possible to create instances of that class. A derived class must define all inherited pure virtual functions of its base classes to be concrete.

class AbstractShape {
public:
        virtual double area() const = 0;
        double value() const;
        // Meyers 3rd Item 7:
        virtual ~AbstractShape();
protected:
        AbstractShape(double valuePerSquareUnit);
private:
        double valuePerSquareUnit_;
protected:
        AbstractShape(double valuePerSquareUnit);
private:
        double valuePerSquareUnit_;
};

// Circle and Rectangle are derived from AbstractShape.

// This will not compile, even if there's a matching public constructor:
// AbstractShape* p = new AbstractShape(value);

// These are okay:
Rectangle* pr = new Rectangle(height, weight, value);
Circle* pc = new Circle(radius, value);

// These are okay, too:
AbstractShape* p = pr;
p = pc;

Object Oriented C++: Under the Covers

(You can skip this section if you already know what a "vtbl" is.)

How does all this run time magic happen? The usual implementation is, every class with any virtual functions has an array of function pointers, called a "vtbl". Every instance of such as class has a pointer to its class's vtbl, as depicted below.

Figure 1. A class's vtbl points to the class's instance member functions.

If an abstract class with a pure virtual function doesn't define the function, what goes in the corresponding place in the vtbl? Traditionally, C++ implementors have provided a special function, which prints "Pure virtual function called" (or words to that effect), and then crashes the program.

Figure 2. An abstract class's vtbl can have a pointer to a special function.

Build 'em Up, Tear 'em Down

When you construct an instance of a derived class, what happens, exactly? If the class has a vtbl, the process goes something like the following.

Step 1: Construct the top-level base part:.

Make the instance point to the base class's vtbl.
Construct the base class instance member variables.
Execute the body of the base class constructor.

Step 2: Construct the derived part(s) (recursively):

Make the instance point to the derived class's vtbl.
Construct the derived class instance member variables.
Execute the body of the derived class constructor.

Destruction happens in reverse order, something like this:

Step 1: Destruct the derived part:

(The instance already points to the derived class's vtbl.)
Execute the body of the derived class destructor.
Destruct the derived class instance member variables.

Step 2: Destruct the base part(s) (recursively):

Make the instance point to the base class's vtbl.
Execute the body of the base class destructor.
Destruct the base class instance member variables.

Two of the Classic Blunders

What if you try to call a virtual function from a base class constructor?

// From sample program 1:
AbstractShape(double valuePerSquareUnit)
        : valuePerSquareUnit_(valuePerSquareUnit)
{
        // ERROR: Violation of Meyers 3rd Item 9!
        std::cout << "creating shape, area = " << area() << std::endl;
}

(Meyers, 3rd edition, Item 9: "Never call virtual functions during construction or destruction.")

This is obviously an attempt to call a pure virtual function. The compiler could alert us to this problem, and some compilers do. If a base class destructor calls a pure virtual function directly (sample program 2), you have essentially the same situation.

If the situation is a little more complicated, the error will be less obvious (and the compiler is less likely to help us):

// From sample program 3:
AbstractShape::AbstractShape(double valuePerSquareUnit)
        : valuePerSquareUnit_(valuePerSquareUnit)
{
        // ERROR: Indirect violation of Meyers 3rd Item 9!
        std::cout << "creating shape, value = " << value() << std::endl;
}

The body of this base class constructor is in step 1(c) of the construction process described above, which calls a instance member function (value()), which in turn calls a pure virtual function (area()). The object is still an AbstractShape at this point. What happens when it tries to call the pure virtual function? Your program likely crashes with a message similar to, "Pure virtual function called."

Similarly, calling a virtual function indirectly from a base class destructor (sample program 4) results in the same kind of crash. The same goes for passing a partially-constructed (or partially-destructed) object to any function that invokes virtual functions.

These are the most commonly described root causes of the "Pure Virtual Function Called" message. They're straightforward to diagnose from postmortem debugging; the stack trace will point clearly to the problem.

Pointing Out Blame

There's at least one other problem that can lead to this message, which doesn't seem to be explicitly described anywhere in print or on the net. (There have been some discussions on the ACE mailing list that touch upon the problem but they don't go into detail.)

Consider the following (buggy) code:

        // From sample program 5:
        AbstractShape* p1 = new Rectangle(width, height, valuePerSquareUnit);
        std::cout << "value = " << p1->value() << std::endl;
        AbstractShape* p2 = p1;  // Need another copy of the pointer.
        delete p1;
        std::cout << "now value = " << p2->value() << std::endl;

Let's consider these lines one at a time.

        AbstractShape* p1 = new Rectangle(width, height, valuePerSquareUnit);

A new object is created. It's constructed in two stages: Step 1, where the object acts like a base class instance, and Step 2, where it acts like a derived class instance.

        std::cout << "value = " << p1->value() << std::endl;

Everything's working fine.

        AbstractShape* p2 = p1;  // Need another copy of the pointer.

Something odd might happen to p1, so let's make a copy of it.

        delete p1;

The object is destructed in two stages: Step 1, where the object acts like a derived class instance, and Step 2, where it acts like a base class instance.

Note that the value of p1 might change after the call to delete. Compilers are allowed to "zero out" (i.e., render unusable) pointers after destructing their pointed-to data. Lucky (?) for us, we have another copy of the pointer, p2, which didn't change.

        std::cout << "now value = " << p2->value() << std::endl;

Uh oh.

This is another classic blunder: going indirect on a "dangling" pointer. That's a pointer to an object that's been deleted, or memory that's been freed, or both. C++ programmers never write such code ... unless they're clueless (unlikely) or rushed (all too likely).

So now p2 points to an ex-object. What does that thing look like? According to the C++ standard, it's "undefined". That's a technical term that means, in theory, anything can happen: the program can crash, or keep running but generate garbage results, or send Bjarne Stroustrup e-mail saying how ugly you are and how funny your mother dresses you. You can't depend on anything; the behavior might vary from compiler to compiler, or machine to machine, or run to run. In practice, there are several common possibilities (which may or may not happen consistently):

The memory might be marked as deallocated. Any attempt to access it would immediately be flagged as the use of a dangling pointer. That's what some tools (BoundsChecker, Purify, valgrind, and others) try to do. As we'll see, the Common Language Runtime (CLR) from Microsoft's .NET Framework, and Sun Studio 11's dbx debugger, work this way.
The memory might be deliberately scrambled. The memory management system might write garbage-like values into the memory after it's freed. (One such value is "dead beef": 0xDEADBEEF, unsigned decimal 3735928559, signed decimal -559038737.)
The memory might be reused. If other code was executed between the deletion of the object and the use of dangling pointer, the memory allocation system might have created a new object out of some or all of the memory used by the old object. If you're lucky, this will look enough like garbage that the program will crash immediately. Otherwise the program will likely crash sometime later, possibly after curdling other objects, often long after the root cause problem occurred. This is the kind of problem that drives C++ programmers crazy (and makes Java programmers overly smug).
The memory might have been left exactly the way it was.

The last is an interesting case. What was the object "exactly the way it was"? In this case, it was an instance of the abstract base class; certainly that's the way the vtbl was left. What happens if we try to call a pure virtual member function for such an object?

"Pure virtual function called".

(Exercise for the reader: Imagine a function that, unwisely and unfortunately, returned a pointer or reference to a local variable. This is a different kind of dangling pointer. How could this also generate this message?)

Meanwhile, Back in the Real World

Nice theory. What happens in practice?

Consider five test programs, each with its own distinctive defect:

Directly calling a virtual function from a base class constructor.
Directly calling a virtual function from a base class destructor.
Indirectly calling a virtual function from a base class constructor.
Indirectly calling a virtual function from a base class destructor.
Calling a virtual function via a dangling pointer.

These were built and tested with several compilers (running on x86 Windows XP unless stated otherwise):

Visual C++ 8.0
Digital Mars C/C++ compiler version 8.42n
Open Watcom C/C++ version 1.4
SPARC Solaris 10, Sun Studio 11
gcc:

x86 Linux (Red Hat 3.2), gcc 2.96 / 3.0 / 3.2.2
x86 Windows XP (Cygwin), gcc 3.4.4
SPARC Solaris 8, gcc 3.2.2
PowerPC Mac OS X.4 (Tiger), gcc 3.3 / 4.0

Direct Invocation

Some compilers recognized what was happening in the first two examples, with various results.

Visual C++ 8.0, Open Watcom C/C++ 1.4, and gcc 4.x recognize that a base class's constructor or destructor can't possibly invoke a derived class's member function. As a result, these compilers optimize away any runtime polymorphism, and treat the call as an invocation of the base class member function. If that member function is not defined, the program doesn't link. If the member function is defined, the program runs without problems. gcc 4.x produces a warning ("abstract virtual 'virtual double AbstractShape::area() const' called from constructor" for the first program, and similarly for the destructor for the second program). Visual C++ 8.0 built the programs without any complaint, even at the maximum warning level (/Wall); similarly for Open Watcom C/C++ 1.4.

gcc 3.x and Digital Mars C/C++ compiler 8.42n rejected these programs, complaining, respectively, "abstract virtual `virtual double AbstractShape::area() const' called from constructor" (or "from destructor") and "Error: 'AbstractShape::area' is a pure virtual function".

Sun Studio 11 produced a warning, "Warning: Attempt to call a pure virtual function AbstractShape::area() const will always fail", but builds the programs. As promised, both crash, with the message, "Pure virtual function called".

Indirect Invocation

The next two examples built without warning for all compilers. (That's to be expected; this is not the kind of problem normally caught by static analysis.) The resulting programs all crashed, with various error messages:

Visual C++ 8.0: "R6025 - pure virtual function call (__vftpr[0] == __purecall)".
Digital Mars C/C++ compiler 8.42n: did not generate an error message when the program crashed. (That's fine; this is "undefined" behavior, and the compiler is free to do whatever it wants.)
Open Watcom C/C++ 1.4: "pure virtual function called!".
Sun Studio 11: "Pure virtual function called" (same as for the first two programs).
gcc: "pure virtual method called".

Invocation via a Dangling Pointer

The fifth example in the previous list always built without warning and crashed when run. Again, this is to be expected. For all compilers except Microsoft's, the error message was the same as for the third and fourth examples. Sun's compiler generated the same message, but Sun's debugger provided some additional information.

Microsoft Visual C++ 8.0 has a number of runtime libraries. Each handles this error in its own way.

Win32 console application:

When run without the debugger, the program crashes silently.
When run in the debugger, a program built in debug mode generates the message, "Unhandled exception ... Access violation reading location 0xfeeefeee." This is clearly "dead beef" behavior; when memory was freed, the runtime overwrote it with garbage.
When built in release mode and run in the debugger, the program produces the message, "Unhandled exception ... Illegal Instruction".

CLR console application:

When built in debug mode, the message is, "Attempted to read or write protected memory. This is often an indication that other memory is corrupt." The debug runtime system has marked the freed memory, and terminates the program when it tries to use that memory.
When built in release mode, the program crashes with the message, "Object reference not set to an instance of an object."

When compiled with Sun Studio 11, and run in dbx with Run-Time Checking, the program died with an new error: "Read from unallocated (rua): Attempting to read 4 bytes at address 0x486a8 which is 48 bytes before heap block of size 40 bytes at 0x486d8". This is the debugger's way of saying, "You just used something in a block of memory, but this isn't a block of memory I think you should be using." Once the object was destructed and its memory deallocated, the program could no longer (legally) use that object, or that memory, again.

Owning Up

How can you avoid these kind of problems?

It's easy for the problems in the first four example programs. Pay attention to Scott Meyers, and (for the first two examples) pay attention to any warning messages you get.

What about the "dangling pointer" problem in the fifth example? Programmers, in any language, need to design in terms of object ownership. Something (or some collection of things) owns an object. Ownership might be:

transferred to something else (or some other collection of things), or
"loaned" without transferring ownership, or
shared, by using reference counts or garbage collection.

What kind of "thing" can own an object?

Another object, obviously.
A collection of objects; for example, all the smart pointers that point to the owned object.
A function. When a function is called, it may assume ownership (transferred) or not (loaned). Functions always own their local variables, but not necessarily what those local variables point or refer to.

In our example, there was no clear ownership. Some function created an object, and pointed two pointers at it. Who owns the object? Probably the function, in which case, it should be responsible for avoiding the problem somehow. It could have used one "dumb" pointer (and explicitly zeroed it out after deletion) instead of two, or used some sort of smart pointers.

In real life, it's never that simple, except sometimes in retrospect. Objects can be passed from one module to one very different module, written by other person or another organization. Object ownership issues span equally long chasms.

Any time you pass an object around, you always need to know the answer to the ownership question. It's a simple issue, sometimes with a simple answer, but never a question that magically answers itself. There is no substitute for thought.

Thinking for yourself doesn't mean thinking by yourself, however; there is some good existing work that can help you. Tom Cargill wrote up a pattern language, "Localized Ownership," that describes strategies for these alternatives. Scott Meyers also addresses this in Item 13, "Use objects to manage resources," and Item 14, "Think carefully about copying behavior in resource-managing classes," in the third edition of Effective C++. See References for details.

No Smart Pointer Panacea

Reference-counted smart pointers are very helpful in avoiding these kinds of problems. With smart pointers, ownership belongs to the set of smart pointers that point to the object. When the last such smart pointer stops pointing to that object, the object is deleted. That would certainly solve the problem we've seen here.

But many programmers are just beginning to use smart pointers, and just beginning to learn how to use them. Even with smart pointers, you can still run into these kinds of problems ... if you use smart pointers in dumb ways.

But that's another problem for another day.

References

Tom Cargill, "Localized Ownership: Managing Dynamic Objects in C++"; in Vlissides, Coplien, and Kerth, Pattern Languages of Program Design 2, 1996, Addison-Wesley.

Scott Meyers, Effective C++, Third Edition: 55 Specific Ways to Improve Your Programs and Designs, 2005, Addison-Wesley.

Resources

Scott Meyers’ home page:
http://www.aristeia.com/

Talk back!

Have an opinion? Readers have already posted 12 comments about this article. Why not add yours?