This post originated from an RSS feed registered with Agile Buzz
by Jared Richardson.
Original Post: The Copy/Paste Detector
Feed Title: Jared's Weblog
Feed URL: http://www.jaredrichardson.net/blog/index.rss
Feed Description: Jared's weblog.
The web site was created after the launch of the book "Ship It!" and discusses issues from Continuous Integration to web hosting providers.
Last week on the Agile Testing mailing list the topic of software metrics came up... several were mentioned and I wanted to follow up on one of them here. There's a nifty tool available to you called CPD. It's simply a Copy/Paste detector but it works on Java, C, C++ and PHP code.
The idea is pretty simple. When someone sees a block of code they want to use, sometimes they create a utility class or a common routine from the code and then use that common code. It's a nice, clean, maintainable solution.
However, most of the time, we just copy over the code and move on.
Why is that a problem? After all, it's quicker to just move forward isn't it? Copying the code into my class makes me productive right now.
The problem comes when the copied code needs maintenance. Perhaps it needs to be faster or tweaked to run with the latest toolkit (JDK 1.5 anyone?). Maybe the original code has a bug.
For whatever reason, you need to update the code. If there is a common routine to use, the code changes go in one place. If the code has been copied a dozen times, someone will waste a lot time trying to find all the code that needs updating. And odds are that some of the code will be missed.
Think about it this way. The last time you fixed a bug, did you search the rest of your codebase to see if the bug existed in other code? I know I usually don't.
Another more subtle reason to use CPD is to keep your code more readable. When your code is littered with aptly-named utility classes, it reads easier and is easier to understand. Instead of reading a for loop to understand what it's doing, you'd see a routine called sort_users_alphabetically or select_unique_values. Any of us would instantly understand what the routine was doing but understanding the for loop might be more of a challenge. Especially given how some people like to write code. ;)
They even have a GUI that runs using Java Web Start. If you have a recent version of Java, you don't even have to install anything. Just click this link and point it at your local code tree. You can't get much easier than that.
Is it a tool you need? I've been amazed at how often blocks of code have been copied not once but dozens of times. I've seen this in code from teams that turn out very good code and teams that don't. From what I've seen it's much more common than most people think.
Here's my challenge to you. Try it. See what it finds.
Then try to justify each copied block to yourself. Does the code really need to be copied or do you have an easy refactoring waiting to happen?
Reducing your lines of code, creating more readable code, and easing your maintenance work all at the same time. Sounds like a good day to me.