Most of what is written about error and exception handling is
fairly abstract and vague. When specific recommendations are
made, those usually consist of examples given for specific circumstances. Yet, error handling is a fundamental task for developers. This brief review attempts to categorize the different types of error conditions a program can encounter. By describing these categories, I also provide suggestions on how to handle each type of error condition.
In general, errors a program can encounter tend to be the result of one of three things:
Restrictions: Arguments to a routine that can never work, and always result in an error, define a restriction on the use of the routine. Ensuring that only correct arguments are passed to a routine is the type of thing that programming by contract is meant to address.
Inconsistencies: When values or resources are not what they are expected to be, or are missing, that creates an inconsistency between the expected state of the environment and the actual state. This may be the internal environment, such as a null pointer, or the external environment, such as a corrupt file. It doesn't
encompass inconsistencies in the data model, which often needs to be temporarily inconsistent during an operation (e.g. adding a node to a linked list).
Failures When an operation simply does not work, and it's out of the program's control, this is a failure. For example, a pulled network cable.
These types of errors overlap to an extent (say, a working network could be considered part of the expected state, making it an inconsistency error). In general, though, most errors can fall into one of these categories.
Sometimes failures are not errors, and are just a way of detecting the current external state. For example, opening a file that doesn't exist may fail, but results in an error only when that file is actually needed.
Error Handling Responsibilities
Program code is responsible for the consistency of the internal program state. Generally certain code has primary (ideally, exclusive) responsibility for
parts of the internal state. Inconsistency errors that occur within
the code responsible for that state are bugs.
Sometimes the state management responsibility is shared between different sections of code. This is a bad idea, because it makes assigning
responsibility for an inconsistency error harder, but it does happen in practice.
It's important to make a distinction between error detection and debugging. Often, data generated in the process of error handling is mixed together with diagnostic information. If possible, these types of information should be kept completely separate—at least conceptually, even if combined in a single data structure.
Safe Zones
Restrictions can be checked before calling a routine, or within a routine.
It seems a waste of time to check arguments every time a routine is called
when you already know those arguments are correct. One strategy is to separate
parameter checking from parameter usage. This doesn't work reliably for
library code, where anything can happen between the check and the use of
the parameters, but within a base of code for a particular application or
within a library, you can restrict the code to not change a value known to
be safe.
The code between a parameter check and the next change to a parameter
variable is a safe zone, where parameters don't have to be re-checked.
This is only valid for restriction errors, because
inconsistency and failure errors can be caused by things
outside the code's safe zone. Things like critical sections (in multithreaded
environments), semaphores and file locks are meant to create a very limited
kind of safe zone for inconsistency and failure errors.
The code safe zones for parameters can overlap with others, and may not
be well defined. One way to deal with this is to assign known safe values
to variables which indicate this safety. Joel Spolsky wrote about one way
to do this using variable naming conventions in
Making Wrong
Code Look Wrong. Safe values should be assigned to variables declared constant.
Reporting Errors
Code calling a routine needs to know three things to decide how to proceed: First, whether the data is returned, if any, or if the method invocation succeeded; second, whether an error occurred; and,
third, whether the error is permanent or transitory. This defines the following possible error states returned from a routine:
Successful
Restriction error (always permanent)
Permanent (bug) inconsistency
Transitory (detected) inconsistency
Failure (transitory for all we know)
It's often a bad idea to mix an error code with a return value, such as designating a specific values—say, 0 or -1—to be invalid.
Some languages, like Python, allow multiple values to be returned from a method as
tuples. A tuple is basically an anonymous class, and can be implemented in a language like Java by defining a class
for objects returned by a method, or in C by defining a struct which is passed as a parameter and is updated by the function. But in many cases, exceptions
are a much better way to separate error information from return values.
Exceptions transmit an object from the exception
location to a handler in a scope surrounding it, or surrounding the point
where the routine was called. The exception objects include information by the object type and class, and debugging information the data contained within
that type.
Exceptions by themselves don't indicate the error state, so that must be
included as an attribute of the exception object, or the error
state must be deduced from the debugging information (object type and
data).
Java introduced the controversial notion of checked exceptions, which must
either be caught or declared to be thrown by a method in order to compile,
while unchecked (or runtime) exceptions behave like exceptions in other
languages. The main cause of the controversy is that there has been no good
definition of why there should be a difference and, as a result, no
consistent strategy in the implementation in various libraries, including standard parts of the different Java runtime libraries.
In general, unchecked exceptions are meant for bugs, where an error indicates that the
code is simply wrong and must be fixed (restriction and bug
inconsistency errors). An example is a NullPointerException. Checked
exceptions are for detected inconsistency and failure errors,
where the program may have a strategy of handling the error. An example is
an I/O error.
Transactional Operations
One strategy to handle errors is to make all operations transactional, so that if they fail,
it's as if the operation was never tried. One way implement this is to define an
"undo" operation for every change:
In this example, the functions are also transactional, and thus don't need to be
rolled back if they fail. This can be done with nested if/else blocks, or with
nested try/catch blocks. If the "undo" operations themselves have errors, the result looks more like this:
One way of dealing with this is to modify a copy of the program state, and if all operations succeed, only then commit the changes. The commit may fail,
but this isolates the possible state changing errors to one point, and is
similar how databases implement transactions.
Another way to implement transactional operations is to make a copy of before any state is changed, and use that copy to restore the expected state, in case of an error.
In summary, having a clear taxonomy of error conditions that code may encounter helps develop better strategies for dealing with, and possibly recovering from, those errors.
A deep analysis of error handling and exception safety can be found at http://www.boost.org/more/generic_exception_safety.html. Understanding your goal in error handling for any operation is the first and necessary step to designing that operation properly. Frome the Abrahams article:
* The basic guarantee: that the invariants of the component are preserved, and no resources are leaked. * The strong guarantee: that the operation has either completed successfully or thrown an exception, leaving the program state exactly as it was before the operation started. * The no-throw guarantee: that the operation will not throw an exception.
Your article talked mostly about the strong guaranty, with some interesting stuff about dealing with the states of multiple objects, but there is a place for the no-throw and the basic guarantee in any non-trivial application. For example, a Release()-type method, that frees a resource, should always be no-throw. If you can't rely on cleanup code to actually clean up, then it is not possible to write exception-safe code at all. The basic guarantee is necessary for just about any operation that interacts with serial I/O or anything other than thread-safe random-access memory. You can't in general roll back serial operations.
The Abrahams article was written from the perspective of a class library designer in C++, but the concepts are broadly applicable. I personally found it tremendously enlightening and useful.
I don't have anything to add, other than to say thanks for starting this topic and posting your information; this is something I've been thinking about on and off over the years, and would really like to understand better. (I wish all developers would ;-)
When reporting a restriction error, why is it always permanent? You did not provide an argument why this error state is always permanent. I think always is a bit too much, especially since, as you say, "these types of errors overlap to an extent". I might be nitpicky here, so let's move on to non-nitpicky stuff:
One strategy to handle errors is to make all operations transactional, so that if they fail, it's as if the operation was never tried. One way implement this is to define an "undo" operation for every change:
When I read the introduction to your article, I was under the impression you believed it was unnecessary to treat all errors the same way. While "let's make all operations transactional" certainly isn't your thesis, you do not say anything about why you shouldn't make all operations transactional. Also, what does it really mean to treat a failed operation "as if the operation was never tried"? I feel this is the hard question.
Come to that, in my experience, one of the greatest fouls Java textbook writers make is poorly explaining the convenience of the finally clause. They give more attention to detail to the try-catch idiom, but lip service to the finally clause.
Finally, you say, "In general, unchecked exceptions are meant for bugs, where an error indicates that the code is simply wrong and must be fixed (restriction and bug inconsistency errors). An example is a NullPointerException." The NullPointerException is the crudest example imaginable and should be avoided at all costs as a teaching tool. NullPointerException examples are not even necessarily transferrable to other languages, because its not necessary for a language to provide Null Pointers. Really, at the implementation level, we should be concerned about whether or not something is defined and the constraints on that resource's definition. I.e., is it a Singleton? That's logistics. NullPointerExceptions are physical design, not logical design.
Most people seem to make logical design mistakes, which also seems to be the great commotion over Checked Exceptions in Java and the constant debate over their inclusion. Any feature in any programming language should factor out accidental complexity. Most imperative programming languages allow programmers to factor out type information when declaring variables.
I want a programming language that can help me desribe logistics, not tie me up in error and exception handling. I think in terms of logistics. Sometimes, when I write code in C, I want to "just say something" in code, but there is no succinct way to say it because of physical design constraints. That is a trade-off for writing code in C, and in the right circumstances, I will gladly accept that trade-off.
Thanks for the link. I do remember reading that, but when I was writing this I couldn't find it again and had to rely on my fuzzy memory. I'm glad someone else had the link.
> When reporting a restriction error, why is it always > permanent? You did not provide an argument why this error > state is always permanent. I think always is a bit > too much, especially since, as you say, "these types of > errors overlap to an extent".
By it's nature a restriction error is the result of checking parameters for validity, normally before they are used. If a parameter is invalid once, the same value will be invalid again. If a parameter is invalid because it conflicts with some other state (e.g. no elements in buffer) then that's a consistency error.
> One strategy to handle errors is to make all > operations transactional, so that if they fail, it's as if > the operation was never tried. One way implement this is > to define an "undo" operation for every change: > > When I read the introduction to your article, I was under > the impression you believed it was unnecessary to treat > all errors the same way. While "let's make all operations > transactional" certainly isn't your thesis, you do not say > anything about why you shouldn't make all operations > transactional. Also, what does it really mean to treat a > failed operation "as if the operation was never tried"? I > feel this is the hard question.
Ultimately, you want your software to be safe to use in the widest range of situations, and this means that it will not destroy your data or environment. In this sense you want all errors to be handled at some level in a way that ensures that they do no harm. However this may be at a program level rather than a subroutine level, which I guess is my real point - there are less fine-grained ways of preserving safety if it's more convenient.
> Finally, you say, "In general, unchecked exceptions are > meant for bugs, where an error indicates that the code is > simply wrong and must be fixed (restriction and bug > inconsistency errors). An example is a > NullPointerException." The NullPointerException is the > crudest example imaginable and should be avoided at all > costs as a teaching tool.
I mentioned null pointer exception because it's widely understood, and it's an error that cannot happen due to anything other than incorrect program code (providing the language throws an exception when new fails rather than returning null).
> Most people seem to make logical design mistakes, which > also seems to be the great commotion over Checked > Exceptions in Java and the constant debate over their > inclusion. Any feature in any programming language should > factor out accidental complexity. Most imperative > programming languages allow programmers to factor out type > information when declaring variables.
I was trying to explain checked exceptions rather than defend them, but in their defence they are part of a method's expected channel of communication to the calling code, so it's legitimate to need to declare them. Unchecked exceptions are unexpected because they are bugs, not status information and calling code should never need to know about them. An alternative for returning expected status information (as I pointed out) would be to use output parameters (not supported in Java) or multiple return parameters (like tuples in Python or Ruby, also not supported in Java). Either one of those would also have to be declared in a statically typed language.
> I want a programming language that can help me desribe > logistics, not tie me up in error and exception handling. > I think in terms of logistics. Sometimes, when I write > e code in C, I want to "just say something" in code, but > there is no succinct way to say it because of physical > design constraints. That is a trade-off for writing code > in C, and in the right circumstances, I will gladly accept > that trade-off.
One way I think that can be improved is to recognize that what are all called "errors" are sometimes different things. Ideally, how to handle some types of errors could be simplified or automated by a language or library design, which would reduce the number of "errors" that the developer needs to worry about manually.
It's not clear to me what's the real, effective difference between a Restriction and an Inconsistency. Keeping your null example, if foo is null, and I go
foo.hashCode(); that's an inconsistency
but if I go
UtilityMethod.hashCode(foo) thats a restriction error.
Yet, to my mind, it's the same error - somebody forgot or failed to initialize foo.
> It's not clear to me what's the real, effective difference > between a Restriction and an Inconsistency. Keeping your > null example, if foo is null, and I go > > foo.hashCode(); that's an inconsistency > > but if I go > > UtilityMethod.hashCode(foo) thats a restriction error. > > Yet, to my mind, it's the same error - somebody forgot or > failed to initialize foo.
The latter is a restriction error because it's up to UtilityMethod.hashCode(foo) to decide whether null is an error or not (maybe just a no-op) - that is, the routine is placing restrictions on what it will accept. But from the view of the calling code, if it is an error, it's an inconsistency error.
The distinction isn't useful farther away from the point of the error, it's mostly useful for deciding immediately what to do next (e.g. can it be fixed and retried or not?), if you want to make the decision there.
Very nice and thorough article, thanks. Sorry I discovered the article late but it kinda strikes a nerve so I would like to follow up anyway, with a little bit of advice.
Exception handling is over-engineered more often than not.
Urge the temptation and keep it simple. If the exception happens, there is that handler somewhere far up the execution stack, it knows how to print stack traces and there is not much else you can do to help. Don't write long explanations nor collect tons of diagnostic information -- the logs will boomerang to you anyway and you will be helplessly looking at the code trying to guess what has happened, with our without megabytes of irrelevant diagnostic gibberish.
Same on the receiving end. Of course there are valid cases where you want to process specific type of exception or bail out in an unusual way but these happen far far more seldom than most people think (once every 50,000 lines if I have to throw a number). Your gut will tell when this is the case.
In all other cases, just relax and make sure you don't leak.