Software development is full of best practices which are often talked about but seem to be rarely done. One of the most basic, and valuable, of these is a fully automated build and test process that allows a team to build and test their software many times a day.
The process of an automated build-and-test cycle is continuous integration. It works by having an integration server check out code from the version control system at set time intervals, or at certain times of the day, build that code, run unit tests on the build, and report back build or test results to developers. By providing early build and test reports, developers can fix problems quickly, allowing the project to move forward in an agile fashion: Such failures will not block the progress of the team for long.
Continuous integration, however, assumes that the code builds cleanly on the integration server. While in principle no developer would wish to check in code that breaks the build, in practice build failures still happen. Slava Imeshev, whose company, ViewTier, provides a build automation tool, studied build problems, and found that most build problems emanate from a handful of development anti-patterns (patterns of what not to do). In a recent conversation, Imeshev highlighted three such anti-patterns:
Five o'clock check-in
Based on Imeshev and his team's statistical analysis of build failures at various clients, they noted that the probability of build failure doubles with code checked in after five o'clock in the afternoon.
The reason may be psychological, more than technical. "At five o'clock in the afternoon, most developers are thinking of going home for the day. If at that point you make a change [to the code], you may not be as careful to ensure that everything builds cleanly as you were at nine in the morning," says Imeshev.
The remedy: Don't check code in after five p.m. Instead, have another look at your code with a fresh mind in the morning.
But is that really an option? At Artima, we prefer to check code in as often as possible—after ensuring a clean build, of course. Not checking code in for longer time periods leads to conflicts, especially if another developer touches code you work on.
But what if you work on something, and you really need to leave the office at a certain time, and just don't have enough time to run a clean build? Should you check code in risking a break in the integration build, or should you instead risk possible conflicts with someone else's changes?
"If you are really afraid to break the build, then you likely will not check things in for long periods of time, which can also hinder your team's progress," says Imeshev. "You need to have a culture where it's OK for a developer to break the build on occasion, but then that developer is also responsible for fixing the breakage as soon as possible."
Spoiled fruit
This anti-pattern is the result of the increasingly popular technique of code generation. "When you generate code, you often place the generated artifacts in the same directory as the hand-written code," notes Imeshev. "That's very convenient, since build tools such as Ant can just use the current directory [of the source code] for code generation output. But remember that the integration server also must build that code and run the tests on that code. If your build process does not clean up all generated [artifacts], the outdated generated code lingering around will break your build." That old generated code is like spoiled fruit on a dinner table—hence Imeshev's name for the anti-pattern.
The solution is to place all generated code in a separated directory. For instance, if your source code resides in the source top-level project directory, you might want generated code to be placed in a generated directory, and cause the build script to completely remove that directory before a clean build.
It's a small change
The name of this anti-pattern speaks for itself: A developer thinks that there is no way a tiny change can break the build. "Who would think that reformatting code, or adding a comment can break the build?," asks Imeshev. "Because you don't think your change can break anything, you don't do a clean build before checking the small change in. Yet, small changes are frequent causes of build breakage."
What you may think of as a small change, may not be that small after all. For instance, when adding a comment, you may notice a variable or method name being misspelled. Since you already have the code open in the editor, you just fix that problem, often relying on the refactoring tools of the IDE. If you check that change in without doing a clean build and running the test suite, it may subsequently break the integration build nevertheless.
Should you do a clean build after even the tiniest code change? That appears to be the best solution to this problem, according to Imeshev. But you need to balance the time it takes to run a clean build and test locally with the possibility of breaking the integration build. "The build-and-test cycle may take a long time, so you may decide to not do that after a small change. Breaking the build this way may be OK on occasion—but don't be surprised to be awakened at four in the morning requesting that you fix the build."
What build anti-patterns have you observed in your projects? How often do you check code in? Do you always run a clean build-and-test cycle before checking changes in? To what extent does that impact you and your team's effectiveness?
> I think you've forgotten to post the link. > I think he meant the idea he expressed in succinct text, not a forgotten link.
I had to laugh when I read about "spoiled fruit." We have some of that sitting around in our build, and just a few days ago I had a problem getting the latest code to build on the server for release, because it was complaining about some old generated code.
I had considered moving the generated code to a different directory hierachy before, but not for the spoiled fruit reason. I just wanted it out of our face when we are looking around at the files we actually write by hand. Some of our generated code we call GeneratedX, and make it a package access superclass of the real class named X. Those I thought I could move to a generated directory hierarchy. But then we have some generated artifacts that are actually just plain old X, such as our POJO entity classes, which I kind of like in place.
Regardless, if I want to keep the generated code in place, I think I need to at least come up with a simple Ruby script that cleans up all generated code on a make clean.
> Regardless, if I want to keep the generated code in place, > I think I need to at least come up with a simple Ruby > script that cleans up all generated code on a make clean.
Generating code into the code line directory structure is not such a good idea. The solution is to have a dedicated directory where all the generated stuff goes into. this directory wold be cleaned up for any build:
<!-- Cleans up temp build dir --> <target name="cleanup.generated"> <delete dir="temp/generated"/> </target>
> Generating code into the code line directory structure is > not such a good idea. The solution is to have a dedicated > directory where all the generated stuff goes into. this > directory wold be cleaned up for any build: > Well, maybe we will do that. Putting them all in a different directory hierarchy not only makes it easier to delete them all, but it also makes it a bit more obvious what is and is not generated. But either way, I won't delete the generated files on any build, just from the clean target.
One other kind of generated file we have is a starter file. When we want to generate part of a class, we generate a superclass called GeneratedX.java, and then we peak to see if an X.java exists. If not, we generate a starter X.java. If X.java is already there, we do nothing. (X.java extends GeneratedX.java, and GeneratedX.java is package access so it doesn't show up in the JavaDoc.)
The starter classes should go into the main source directory. We'll then check them in and edit them by hand. If they need to be removed, we need to remove them from CVS anyway. So in theory they shouldn't show up as spoiled fruit.