In this interview with Artima, Agitar founder and CTO Alberto Savoia talks about the importance of mock objects in stubbing out external dependencies, and his company's new mock object technology that automates that process. He also explains how that new technology enables Agitar to guarantee at least eighty percent test coverage for a legacy code base:
Frank Sommers: What testing problem does your latest release try to solve?
Alberto Savoia: The new version of AgitarOne solves the biggest, nastiest, hairiest, and most unpleasant problem in software development: The problem we are addressing is the problem of legacy code, or, to put it differently, a steaming pile of legacy code.
Agitar started by developing Agitator as an interactive tool. It works great, but customers came back and told us, "I want to start being agile, but I’m stuck with this steaming pile of legacy code with 250,000 lines of Java code, zero tests, and it is so hard for me to get my head above water to do the virtuous thing. I am just barely keeping up."
That is what we have been focusing on. Our first task was to start generating unit tests, and we have had some very good results in release 4.0. But customers kept coming back and telling us, "We also have a lot of untestable code in our application. Can you test that for us?"
As an example of untestable code, you can think of a class that cannot be initialized unless the server is running. Untestable code is a very tough problem, but a problem that people want solved. So we put our entire engineering team on that, and developed a batch of new technologies in terms of code analysis and mock objects.
The first result of that is that we can achieve and, in fact, we guarantee, 80 percent code coverage on a project-basis to our customers. You can have a big hairy piece of code, run it through AgitarOne, and if you don’t get 80 percent code coverage, then we will give you back some money.
If you are familiar with cyclomatic complexity and look at a method with a cyclomatic complexity of, say, 7, that means that there are seven decision points and, in theory, seven tests that you have to write. You can run other coverage tools that have less severe metrics than ours, and you can get closer to even 90 percent coverage based on those metrics. The reason we can get to that high coverage is because we have this new mocking technology.
There is a lot of code, especially legacy code, where in order to generate tests, you need to mock the database or the file system, or some other aspects of the environment. Those things are pretty hard to do by hand. In fact, we found that a huge percentage of our customers' code is literally not testable. You cannot test it until you break some of the laws of physics. Our mocking technology helps with those cases.
The other result of this work is in performance. We did some more optimization in AgitarOne, and now you are able to achieve up to 250,000 lines of JUnit generation per hour. You can take some of those big hairy applications, and in a matter of hours develop a code base of characterization tests.
We found over the years that the ratio of code to test code very rarely changes: Typically for every line of Java, it takes 3 to 5 lines of JUnit to achieve 80-90 percent code coverage. If you have 250,000 lines of Java code, you probably need about one million lines of JUnit. Writing tests is hard for anybody, and writing tests for legacy code, especially that many tests, is no fun. You are looking at many engineering years of effort.
Frank Sommers: You said that your tool generates characterization tests. What are characterization tests?
Alberto Savoia: Characterization test is the term that was introduced by Michael Feathers in his book, Working Effectively with Legacy Code. If you have some legacy code, chances are you are not going to have a specification that tells you what the code should do, and even if you do, how to translate that specification into tests. Characterization tests give you a way of preserving the desired behavior, or the previous behavior, of the code, and that allows you to evolve the code without breaking backward compatibility.
For most people, their main concern in dealing with legacy code is that they want to add functionality without breaking the whole thing. What a characterization test does is record the actual behavior of the code: not the intended behavior, but what the code actually does. When you go and make a change, it lets you know what previous behaviors you have broken. At that point you can decide if that was the result of an intentional change, or if that error came about because of an unintentional change.
With characterization tests, you are really not testing for correctness. You care more about having bit-for-bit compatibility. The other day there was a guest here from M.I.T. who developed a tool that went and found a lot of bugs in some Microsoft libraries. When he presented the bugs to the Microsoft team, their response was, "Oh, this is great. Unfortunately, we cannot fix them because these bugs have been around for two years, and we have to keep those bugs to be compatible."
Frank Sommers: What's new in the mock objects technology you developed for AgitarOne?
Alberto Savoia: First of all, mocking is an absolute necessity if you want to write unit tests. Without mock objects, you get stuck very quickly, or you have non-portable unit tests. If you need a database around the tests, you are pretty much out of luck.
Even so, we found that less than 5 percent of all units tests use mocking technology because mocking is so hard to do by hand. The first thing we did was to automatically generate some mocks. As I said, however, we ran into problems where the code was not mockable using standard techniques. We had to push it a step further.
One way to think about this is that we start to intercept method calls at the bytecode level, and then figure out what resources or outside objects are used in that method, and if the execution requires the presence of those resources. A simple example is that there is somewhere inside a method a file that refers to a file path. If we need that file to be there, the execution path that assumes the presence of that file cannot proceed without something that would stand in for that file. What we can do, however, is to supply that information at runtime, providing the information the code needs to continue that execution path.
Without that runtime modification of the code, if the file doesn't exist in that particular path, you always get the exception path. But because we now control the file object, we can test the path that you get when the file is there. Normally, you can't break the laws of physics and say that something that's not there, is actually there. And, normally, you can't mock a file. What we're saying here is that we know it's a file, and we're going to go ahead and do something different at that point in the execution path. The result is that your tests achieve much higher code coverage, and that your tests are truly portable.
Frank Sommers: What are the limitations of this technique? What types of objects can't be mocked this way?
Alberto Savoia: We don’t want to oversell what we do. We generate unit tests, and that means we assume that the file system does what it is supposed to do, and the database does what it is supposed to do. There is a limit to what we can simulate.
This kind of test does not eliminate the need for doing integration and end-to-end testing, since you're still relying on the environment your code runs in. What it gives you, though, is that if you are working on a particular class at a time on the legacy code base and make a change, you expect that change to have some consequences, to break some behavior. By detecting all the things we would change within the scope of what we know of the code, you can determine if what broke was just what you expected, or if there were some other things that broke, too, that you didn't expect.
What do you think of Agitar's approach to working with mock objects and testing legacy code?