Artima Developer Spotlight Forum - A Taxonomy of Error Types and Error Handling

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Error Handling Responsibilities

Program code is responsible for the consistency of the internal program state. Generally certain code has primary (ideally, exclusive) responsibility for parts of the internal state. Inconsistency errors that occur within the code responsible for that state are bugs.

Sometimes the state management responsibility is shared between different sections of code. This is a bad idea, because it makes assigning responsibility for an inconsistency error harder, but it does happen in practice.

It's important to make a distinction between error detection and debugging. Often, data generated in the process of error handling is mixed together with diagnostic information. If possible, these types of information should be kept completely separate—at least conceptually, even if combined in a single data structure.

Safe Zones

Restrictions can be checked before calling a routine, or within a routine. It seems a waste of time to check arguments every time a routine is called when you already know those arguments are correct. One strategy is to separate parameter checking from parameter usage. This doesn't work reliably for library code, where anything can happen between the check and the use of the parameters, but within a base of code for a particular application or within a library, you can restrict the code to not change a value known to be safe.

The code between a parameter check and the next change to a parameter variable is a safe zone, where parameters don't have to be re-checked. This is only valid for restriction errors, because inconsistency and failure errors can be caused by things outside the code's safe zone. Things like critical sections (in multithreaded environments), semaphores and file locks are meant to create a very limited kind of safe zone for inconsistency and failure errors.

The code safe zones for parameters can overlap with others, and may not be well defined. One way to deal with this is to assign known safe values to variables which indicate this safety. Joel Spolsky wrote about one way to do this using variable naming conventions in Making Wrong Code Look Wrong. Safe values should be assigned to variables declared constant.

Reporting Errors

Code calling a routine needs to know three things to decide how to proceed: First, whether the data is returned, if any, or if the method invocation succeeded; second, whether an error occurred; and, third, whether the error is permanent or transitory. This defines the following possible error states returned from a routine:

Successful
Restriction error (always permanent)
Permanent (bug) inconsistency
Transitory (detected) inconsistency
Failure (transitory for all we know)

It's often a bad idea to mix an error code with a return value, such as designating a specific values—say, 0 or -1—to be invalid. Some languages, like Python, allow multiple values to be returned from a method as tuples. A tuple is basically an anonymous class, and can be implemented in a language like Java by defining a class for objects returned by a method, or in C by defining a struct which is passed as a parameter and is updated by the function. But in many cases, exceptions are a much better way to separate error information from return values.

Exceptions transmit an object from the exception location to a handler in a scope surrounding it, or surrounding the point where the routine was called. The exception objects include information by the object type and class, and debugging information the data contained within that type. Exceptions by themselves don't indicate the error state, so that must be included as an attribute of the exception object, or the error state must be deduced from the debugging information (object type and data).

Java introduced the controversial notion of checked exceptions, which must either be caught or declared to be thrown by a method in order to compile, while unchecked (or runtime) exceptions behave like exceptions in other languages. The main cause of the controversy is that there has been no good definition of why there should be a difference and, as a result, no consistent strategy in the implementation in various libraries, including standard parts of the different Java runtime libraries.

In general, unchecked exceptions are meant for bugs, where an error indicates that the code is simply wrong and must be fixed (restriction and bug inconsistency errors). An example is a NullPointerException. Checked exceptions are for detected inconsistency and failure errors, where the program may have a strategy of handling the error. An example is an I/O error.

Transactional Operations

One strategy to handle errors is to make all operations transactional, so that if they fail, it's as if the operation was never tried. One way implement this is to define an "undo" operation for every change:

Ideal transactional function

In this example, the functions are also transactional, and thus don't need to be rolled back if they fail. This can be done with nested if/else blocks, or with nested try/catch blocks. If the "undo" operations themselves have errors, the result looks more like this:

Realistic transactional function

One way of dealing with this is to modify a copy of the program state, and if all operations succeed, only then commit the changes. The commit may fail, but this isolates the possible state changing errors to one point, and is similar how databases implement transactions. Another way to implement transactional operations is to make a copy of before any state is changed, and use that copy to restore the expected state, in case of an error.

In summary, having a clear taxonomy of error conditions that code may encounter helps develop better strategies for dealing with, and possibly recovering from, those errors.

David Gladfelter

Posts: 1
Nickname: tattva
Registered: Mar, 2006