Summary
In a recent IBM DeveloperWorks, Elliotte Rusty Harold presents the concept of fuzzy testing: A technique that checks program behavior in the presence of arbitrary input.
Advertisement
One seldom-mentioned side-effect of developers testing their own code is that most unit tests tend focus on conditions that developer anticipate. According to Elliotte Rusty Harold's IBM DeveloperWorks article, Fuzz Testing, real world conditions throw a more diverse, and less predictable, set of errors at code.
While the article mainly focuses on detecting errors when reading from files, the technique of "fuzz testing" can be applied more generally:
In fuzz testing, you attack a program with random bad data (aka fuzz), then wait to see what breaks. The trick of fuzz testing is that it isn't logical: Rather than attempting to guess what data is likely to provoke a crash (as a human tester might do), an automated fuzz test simply throws as much random gibberish at a program as possible. The failure modes identified by such testing usually come as a complete shock to programmers because no logical person would ever conceive of them.
Harold outlines a simple process to perform fuzz testing in the context of reading from input files:
Prepare a correct file to input to your program.
Replace some part of the file with random data.
Open the file with the program.
See what breaks.
This concept can be extended to, say, Web applications, by feeding an input stream containing arbitrary data to a servlet.
The key point in Harold's article is that providing arbitrary data to an application approximates possible real-life input, and can be a valuable addition to unit tests. Such inputs also force the developer to code defensively, by anticipating all sorts of invalid input.
One question that comes to mind reading Harold's article, is just how far one should go in testing an application with arbitrarily complex input. In a Web application, the main input comes via an input stream, and the application can respond to different kinds of errors in widely differing ways.
For instance, most application frameworks provide an automatic conversion of HTTP parameters to Strings—but how many frameworks actually check the potential size of input that would be converted to a String?
To what extent do you check your application's behavior, for instance, in the presence of an infinitely long input parameter? And to what extent do you rely on frameworks, such as the Servlet API, doing the right things with that sort of input?