Summary
I just read a good introduction to what is arguably the most important part of the agile development process, when it comes to quality. It does a good job of explaining that the real goal is not to “test” your code after you write it, but rather to create a “runnable specification” before you write it.
Advertisement
Designing Programs with RSpec and Cucumber gives a good introduction to what is arguably the most important part of the agile development process, when it comes to quality. It does a good job of explaining that the real goal is not to “test” your code after you write it, but rather to create a “runnable specification” before you write it.
Note:
Ruby’s RSpec just happens to be my favorite tool for that. And while the “R” in RSpec is generally taken to mean “Ruby”, I like to think of it as a Readable, Runnable, Reviewable Specification. So I’ll use the term “RSpec”, regardless of which tool is used to create one.
Like any specification, an RSpec is something you create before coding. Because it is a specification, it is readable. You can skim down the page to see how the API works. It doesn’t give the “big picture” that good API documentation can provide, but it does a better job of describing the details, particularly in corner cases that API docs generally leave out. For example: What happens if you pass zero, -1, or null? Does the program crash? Do you get an exception? Do you get a return value? If so, what is it?
But because an RSpec is also runnable, it is in effect a unit test, ensuring that the implementation matches the specification, and that it continues to match the specification in future iterations, thereby preventing regressions.
In addition, as Martin Fowler pointed out so well in his seminal work on Refactoring, a comprehensive test suite lets you refactor with impunity, secure in the knowledge that the new version will work exactly the same as it did before. The highly desirable result is that you can engineer elegance into the product, instead of watching bug-fix entropy reduce the original beautiful design into a wad of ugly patches.
Of course, you only get that kind of confidence when your test suite is comprehensive--and you only get that kind of comprehensive coverage when you write them first, as part of the specification process.
But here is the key point, that needs to be written in flaming letters, ten feet tall:
Because an RSpec is readable, it can be reviewed--before the code meets the road.
That is a key thought, because early specification review is liable to turn out to be a key component of product quality.
It has long been known the early design reviews are the single most significant predictor of project success. It makes sense. If the basic design is good, then fixes will be limited to correcting oversights and making minor tweaks. But if the design is flawed, any bug could be the one that sends the developers back to the drawing board for the kind of costly redesign that imperils a project.
Ongoing specification reviews, on the other hand, are likely to be an even more significant indicator of project quality. For example, when writing a routine to convert a path to a URL, it was pretty easy to come with a couple of dozen variations that the API could possibly handle, where the input could contain forward or back slashes, absolute or relative paths, paths with "up" (..) or "here" (.), as well as paths with URL prefixes like http:// or ftp://.
It didn't make sense to code for all of those cases, of course. That's where the "engineering" part of software engineering came into play: Deciding which cases were important enough to handle. But the fact that the cases were enumerated at all was a significant determinant of quality.
For one thing, it was possible to document which cases were covered and which weren't. That documentation made it possible for reviewers to see exactly how the implementation worked, identify corner cases that might have been overlooked, and lobby for implemenation of cases they might otherwise have been deferred indefinitely. And, of course, the fact that the specifcation was runnable ensured that the specification was 100% accurate, and that it remained so as the implementation evolved.
It was a small case, but it illustrated the potential benefit of "specification review"--as long as those specifications are (a) easy to write, (b) easy to read, and (c) runnable. If every significant API in an application is subject to such peer review, there is no doubt that reliability will improve significantly.
> It does a good job of explaining that the real goal is > not to “test” your code after you write it, but rather to > create a “runnable specification” before you write it.
It is at least the greatest "merit" of Extreme Programming that its author creatively merged, or confused(?), the testspec with the systems spec. In industrial practice a separate testcase spec is often omitted but a testcase is nevertheless linked to some requirement in a systems spec, often written by technical people other than programmers.
BTW who reads a javadoc generated API doc to get the "big picture"? It never happened to me - I just failed. Such a documentation can serve as a manual ( "how to use a class" ) but doesn't compensate for a design document either in particular when the API grows beyond the size of a few classes ( same goes with UML diagrams which can be elements of a design documentation, just like code or pseudo-code ).
While I think automated testing has a lot of value, does it "Ensure Software Quality"? Absolutely not. Does it guarantee that "the new version will work exactly the same as it did before." Nope.
The only way that could be true is if the automated test confirmed the result of every possible input (and state, if you have state.) Except for the most trivial methods, this just isn't feasible.
Again, I'm not saying there is no value in automated testing but that there is no magic bullet.
> It is at least the greatest "merit" of Extreme Programming > that its author creatively merged, or confused(?), the > testspec with the systems spec.
I'm all for executable or at least testable (compilable?) specifications. The big danger in XP is that people confuse having test specs which provide executable specs with those being full requirements. There's been interesting discussion on the Agile Alliance list in linked-in recently about how usability was missed in the framing of the original Agile Manifesto.
> BTW who reads a javadoc generated API doc to get the "big > picture"? One of the reasons I use doxygen is because it makes it very easy to write additional pages of documentation, that end up with hyperlinks to the source. It is much like a wiki that travels with the source code, in the same repository. Coupled with additional diagrams from Graphviz, which Doxygen supports as inclusions, and you can have a lot of design documentation generated at the same time as your normal class docs.
> I'm all for executable or at least testable (compilable?) > specifications. The big danger in XP is that people > confuse having test specs which provide executable specs > with those being full requirements. There's been > interesting discussion on the Agile Alliance list in > linked-in recently about how usability was missed in the > framing of the original Agile Manifesto.
The term "specification" seems to mean lots of different things to different people. It needs disambiguation. It might become obvious that certain aspects can be easily separated and expressed using RSpec-style DSLs, while others are not.
> One of the reasons I use doxygen is because it makes it > very easy to write additional pages of documentation, that > end up with hyperlinks to the source. It is much like a > wiki that travels with the source code, in the same > repository. Coupled with additional diagrams from > Graphviz, which Doxygen supports as inclusions, and you > can have a lot of design documentation generated at the > same time as your normal class docs.
I used doxygen/javadoc style documentation extractors for long but moved away from them recently in favour for Sphinx ( also for Java and C++ projects ). The documentation contains all that is needed: "getting started" docs, system design documentation, tutorials and the API description. The integration with code is less tight but I realized I liked it better because I can more cleanly separate what is intended to be a proper API and what is just "public" in the sense of access modifiers. So the API description becomes a manual, something which is directed to the user, not a huge overload of implementation details ( lots of infrastructure crap which might just go away after the next refactoring ).
> I used doxygen/javadoc style documentation extractors for > long but moved away from them recently in favour for > Sphinx ( also for Java and C++ projects )... > The integration with code is less tight
Unless I'm missing something, it seems like the Sphinx (rant below) support for C++ is actually pretty minimal?
The environment I'm in also uses a lot of C# which Doxygen supports nicely so I'm probably stuck with Doxygen for now, although I agree the markup of Sphinx with reST is appealing.
I most miss tables in Doxygen (have to use HTML tags) but the @page, @subpage, @section and @subsection tags make it very easy to structure the additional documentation. I also really love being able to use @dot..@enddot to include a small fragment of Graphviz dot diagram generating code, or move a bigger diagram out to a separate file.
(rant) I am extremely fed up with people using "real world" names for projects especially when there's already heavy use of a name. Sphinx is a very annoying name for something to do with code, try Googling it.
> The only way that could be true is if the automated test > confirmed the result of every possible input (and state, > if you have state.) Except for the most trivial methods, > this just isn't feasible.
If not every possible input value is tested, but instead range of values is tested all at once, could the test be feasible?
For example, a function that takes an integer of 0 to 100 as input does not need to have one test for each value in the range 0 to 100, if we ensure that the input is correct; and therefore we can use only one value from the range [0,100] to test our function.
> > I used doxygen/javadoc style documentation extractors > for > > long but moved away from them recently in favour for > > Sphinx ( also for Java and C++ projects )... > > The integration with code is less tight > > Unless I'm missing something, it seems like the Sphinx > (rant below) support for C++ is actually pretty minimal? > If you have more information I'd love to hear it.
Yes, C++ support is incomplete. Notice that the project I referred to is tiny:
> The environment I'm in also uses a lot of C# which Doxygen > supports nicely so I'm probably stuck with Doxygen for > now, although I agree the markup of Sphinx with reST is > appealing.
I was thinking about adding a silverlight directive for Sphinx which allows to execute demo code for DLR languages. So people have an interactive online tutorial embedded with the docs. This effort has been stalled because I'm too busy with other things; but maybe next year.
> (rant) > I am extremely fed up with people using "real world" names > for projects especially when there's already heavy use of > a name. Sphinx is a very annoying name for something to do > with code, try Googling it.
> > The only way that could be true is if the automated > test > > confirmed the result of every possible input (and > state, > > if you have state.) Except for the most trivial > methods, > > this just isn't feasible. > > If not every possible input value is tested, but instead > range of values is tested all at once, could the test be > feasible?
Automated test can be feasible but it is far from to be complete. Notably they badly fails with code that could have race condition of any type like multi-threaded code or networking. It takes long to test for race conditions.
> If not every possible input value is tested, but instead > range of values is tested all at once, could the test be > feasible? > > For example, a function that takes an integer of 0 to 100 > as input does not need to have one test for each value in > the range 0 to 100, if we ensure that the input is > correct; and therefore we can use only one value from the > range [0,100] to test our function.
For that one case it seems fine. For most real applications, the range of inputs and values is so large that we can consider it infinite.
Consider a method that checks to see if a number that is really prime. Say that the input range is any positive integer that can be represented with 128 bits. That's roughly 3*10^38 different inputs according to my computer's calculator.
How are you going to automate testing for that? In reality, you would likely build in input file of numbers and whether they are prime and it's a really good idea. I fully support it. But it's very dangerous to confuse that with 'proving' or 'ensuring' the code is correct. You could spend years building a file with test inputs and expected results and you'd still have covered an insignificant amount of the range.
> > > The only way that could be true is if the automated > > test > > > confirmed the result of every possible input (and > > state, > > > if you have state.) Except for the most trivial > > methods, > > > this just isn't feasible. > > > > If not every possible input value is tested, but > instead > > range of values is tested all at once, could the test > be > > feasible? > > Automated test can be feasible but it is far from to be > complete. Notably they badly fails with code that could > have race condition of any type like multi-threaded code > or networking. It takes long to test for race conditions.
For multithreaded code, I think a good approach would be to allow the testing tool to 'see' what variables are used in what threads, by "labeling" the variables used by more than one thread.
This check can also be done through type systems: if a type is set to be visible only by more than one thread, then access to the variables of this type can be checked by an automated tool.
Deadlocks can be checked by ensuring ordering of locks is always the same.
> > If not every possible input value is tested, but > instead > > range of values is tested all at once, could the test > be > > feasible? > > > > For example, a function that takes an integer of 0 to > 100 > > as input does not need to have one test for each value > in > > the range 0 to 100, if we ensure that the input is > > correct; and therefore we can use only one value from > the > > range [0,100] to test our function. > > For that one case it seems fine. For most real > applications, the range of inputs and values is so large > that we can consider it infinite. > > Consider a method that checks to see if a number that is > really prime. Say that the input range is any positive > integer that can be represented with 128 bits. That's > roughly 3*10^38 different inputs according to my > computer's calculator. > > How are you going to automate testing for that? In > reality, you would likely build in input file of numbers > and whether they are prime and it's a really good idea. I > fully support it. But it's very dangerous to confuse that > with 'proving' or 'ensuring' the code is correct. You > could spend years building a file with test inputs and > expected results and you'd still have covered an > insignificant amount of the range.
Isn't it enough to check the subroutine using a prime and a non-prime?
The range-based checking would be used inside the prime checking function, so if all the functions inside the prime checking function are correct, then the prime checking function is correct as well, assuming it passes the test for one prime and one non-prime.
> For multithreaded code, I think a good approach would be > to allow the testing tool to 'see' what variables are used > in what threads, by "labeling" the variables used by more > than one thread.
OK, and then what? One of the big problems with testing multi-threaded code is that it's very hardware dependent. Your code might work great on a single core machine and break on a dual core. Or, it might work on a dual core but fail on a quad-core, etc.
> This check can also be done through type systems: if a > type is set to be visible only by more than one thread, > then access to the variables of this type can be checked > by an automated tool.
That sounds like static analysis, not testing.
> Deadlocks can be checked by ensuring ordering of locks is > always the same.
How is do you test that they are 'always' the same?
> Isn't it enough to check the subroutine using a prime and > a non-prime?
Uh, no.
A trivial example:
def isPrime(x):
return (x % 2) != 0
Test:
isPrime(3) -> True isPrime(4) -> False
> The range-based checking would be used inside the prime > checking function, so if all the functions inside the > prime checking function are correct, then the prime > checking function is correct as well, assuming it passes > the test for one prime and one non-prime.
> > For multithreaded code, I think a good approach would > be > > to allow the testing tool to 'see' what variables are > used > > in what threads, by "labeling" the variables used by > more > > than one thread. > > OK, and then what? One of the big problems with testing > multi-threaded code is that it's very hardware dependent. > Your code might work great on a single core machine and > d break on a dual core. Or, it might work on a dual core > but fail on a quad-core, etc.
The compiler could automatically insert synchronization primitives.
> > > This check can also be done through type systems: if a > > type is set to be visible only by more than one thread, > > then access to the variables of this type can be > checked > > by an automated tool. > > That sounds like static analysis, not testing.
Part of testing is static analysis.
> > > Deadlocks can be checked by ensuring ordering of locks > is > > always the same. > > How is do you test that they are 'always' the same?
By making locks different types.
Flat View: This topic has 51 replies
on 4 pages
[
1234
|
»
]