Summary
If you read Computer Science text books you might imagine external program documentation actually exists. Yet, when I asked many of my colleagues if they had ever had such documentation throughout a project's life cycle, they laughed a bit uncomfortably and said, "well, no, not really." What's going on here?
Advertisement
Documentation
"Do you have written documentation for your project? You know, all that
stuff the books say you're supposed to have: theory of operations documents
and high-level designs and stuff like that?", I asked.
"You're kidding, right?", was the response.
The Myth of External Program Documentation
In this blog I'd like to pick up on something I ended with last time. Here's
the final note from
The Future of UML:
I ran into somebody who had still been associated with the project this summer,
some 4+ years after the first new Prescriber had shipped. "Had the documentation
been helpful?", I asked. "Are you kidding?!", my friend replied. "It scared the
hell out of 'em." I looked aghast. He continued, "they take one look at the size
of the notebook you left and they say I'm not touching that! It looks complicated!"
If you read Computer Science text books you might imagine external program
documentation such as high-level designs, specifications, and theory of operation
descriptions actually exist in most projects. Yet, when I asked many of my friends
and colleagues if they had ever had such documentation throughout a project's
life cycle, they laughed a bit uncomfortably and said, "well, no, not really."
What's going on here?
While we're on it, how do you explain the reaction to the project documentation
I'd left for the Prescriber? Is this common? The answer to that appears to be
"yes". External program documentation, despite what you may have learned
in all those high-minded computer science text books, remains one of the weak links
in software development. How did this come to be and what is it costing us? And,
why, if the industry standard is to eschew external program documentation, do we
continue to preach the importance of such things?
Obsolescence
In his book
Extreme Programming Explained: Embrace Change, Kent Beck expands on the original
programmer adage "good, fast, cheap: pick two" and claims customers actually choose
three of four factors: cost, schedule, features, and quality. Developers choose
(or are at least stuck with) the last variable. In either
case I think we have, at best, done a poor job of identifying costs or, at worst,
have openly lied about costs, thereby skewing what customers might have selected.
Program maintenance constitutes 40-80% of software costs, yet I recall
few discussions during program planning or development about such things.
Further, maintenance costs fall largely into the category of enhancements.
As Robert Glass in his book
Facts and Fallacies of Software Engineering, puts it:
The 60/60 rule: 60 percent of software's dollar is spent on
maintenance, and 60 percent of that maintenance is enhancement.
Enhancing old software is, therefore, a big deal.
Given this large number one might expect that we'd be looking for ways
to manage this cost and mitigate any associated risks. Steve Rakitin, in
his book
Software Verification and Validation for Practitioners and Managers,
make the observation (also quoting from DeMarco and Lister) that
"Turnover is incredibly expensive." Rakitin's book focuses squarely on
software quality and I believe he's right to discuss turnover in his
"Balancing People, Process, and Product" chapter. But, why is
turnover so expensive?
It is estimated that roughly 30% of the total maintenance time is spent
"understanding the existing product". (Again, I turned to Glass's book
for this figure since it was handy.) This fact relates directly to the
turnover number as illustrated by an Air Force study in 1983 where
researchers found that the "biggest problem of software maintenance"
was "high [staff] turnover" (at 8.7 on a scale of 1 to 10), "understanding
and lack of documentation" (7.5), and "determining the place to make a
change" (6.9). I contend they are all related. If you have no usable
documentation then all of the information is in people's heads. If the
heads walk out the door (turnover) then the information needs to be
rediscovered. That is not cheap.
Of course, useless documentation is no help. Since a small
percentage of the software life cycle is dedicated to the creation of
documentation, the quality of it is immediately suspect. After all,
if the documentation is already untrustworthy, why maintain it?
Why throw good money after bad? But, is a given document useless
10 minutes after it is written? How about 10 days? How about 10
weeks? The tendency is to dismiss anything outside the code as
"out-of-sync with reality" whether, objectively, that is true or
not.
Finally, there is a notion that the code is golden (at least
documents itself) even if no other external document is present
purporting to do so. Tools like JavaDOC, which scrape the Java
source code of your project and create a hierarchy of web pages,
can create the documentation at the push of a button. The bits
will be new and fresh but are they right? That is, how
is it more likely that the comments describing the inner workings
of a particular subsystem are more accurate, descriptive, and insightful
just because they were pulled from the Java code? Put another way,
does source code and its associated comments ever get "out-of-sync"?
Of course it can. At best the co-location of the documentation and the
code can eliminate the need for "finding the right place to update the
documentation", but it isn't a panacea. It still takes work, and
discipline, to ensure the documentation is correct.
Assume for a moment we're willing to maroon the maintenance
programmer with no (useful) documentation. Can documentation early
the software life cycle help mitigate risk? It can't be out-of-sync
too far while we're still writing it, can it?
Failure to Drive Out Risk
One of the arguments for external program documentation, especially before the
coding stage, is that it should help drive out risk. I actually agree with this,
at least in principal, but have been forced to also recognize this activity's
limits. For example, it isn't unheard of to have a feasibility study to
determine if a project is even worth attempting. Certainly this would qualify
as a risk reducing activity. Yet, as Glass recounts in his book, he was in the
audience at the International Conference on Software Engineering (ICSE) in Tokyo in
1987 when Jerry Weinberg presented his keynote. Weinberg asked the audience who among
them had ever participated in a feasibility study where the answer
came back "No". Not a single hand of the 1,500 in attendance was raised.
Which begs the question "how many of these documents are good science
and how many are simply position papers.
Before we have too much fun beating up management on their feasibility studies
I think we should take a hard look at our own writings. That is, how many of
our (scant few) documents are intended to be good science and how many
have simply been constructed to deflect attacks from our critics, get our way on
some technical issue, select our favorite vendor, choose our favorite product,
or simply embarrass those who dared to disagree with us?
Glass quotes a fellow named Bill Curtis who said "in a room full of top software
designers, if any two of them agree, that's a majority." We programmers do like
to have our opinions! But, it is sometimes difficult to even have documentation
crisp enough to know what it is we are arguing about. I've seen some poor designs
win over better alternatives put forth because the advocate for the better alternative
simply didn't have the presentation skills to get the facts out there. Winning
the argument and getting the right answer are two different things. In the case
where the poorer design won, it is difficult to justify the documentation process
as risk mitigation.
Literature
In many ways, a good design document is like a novel. It speaks of a world
thus far only fictitious, in a tense that makes it sound like it already
exists. "The program does this and this and that..." Since most engineers
are poor writers (or, at least inexperienced novelists), it is no
wonder good design documents are hard to find.
I think it goes beyond that, however. It takes some courage to put things
in writing and then live with the consequences. Going on the record
may not be in your (personal) best interest -- even if you're right. And,
of course, being wrong, in writing, can hang about your neck like the
proverbial albatross.
In some ways software design documents differ from novels:
usually the novelist knows how the story is going to end. Such is not always
the case in software development. Yet, the design decisions made early in a
project will affect the software throughout its life cycle. Have you ever worked
on something and asked, either under your breath or even out loud, "what were
they thinking?!" Documentation of the thinking at the time, even if it drew
incorrect conclusions, would be revealing to a maintenance programmer later.
Such revelations could save that programmer hours or even days if they had
such a document. "Oh, I see where they were going with this; and, I can see
why it didn't work out." Still, with all the value that might give, we don't
do it.
The final comparison to literature I believe is most apt. Writers
make little money. This may also be true in the software world. Our boss or
our customer needs the software; the documentation is our problem or
just some internal software concern. Customers and management don't
put any emphasis on it, so software developers concentrate on what they're
being measured upon: delivering code.
Myth vs. Reality
My problem with all this is that I can't reconcile the software engineering
practices book world from the real world. The software books blithely
continue to tell us about external program documentation and how it is used
throughout the software life cycle. For the vast majority of projects I have
worked on, there are no such documents. I contend we should fix one or the
other. The Extreme Programming crew has made their choice: they don't pretend
to create such artifacts. (Castigate them if you must, but at least they
are honest about what they do.) For the rest of us, it is a world of denial.
I can't help but wonder if we couldn't do better.
References
Facts and Fallacies of Software Engineering by Robert L. Glass.
Find it on amazon.comhere.
Extreme Programming Explained: Embrace Change by Kent Beck.
Find it on amazon.comhere.
Curtis, B. R., Guindon, H. Krasner, D. Waltz, J. Elam, and N. Iscoe. 1987.
Empirical Studies of the Design Process: Papers for the Second Workshop
on Empirical Studies of Programmers. MCC Technical Report Number STP-260-87.
I wasn't able to find this on the web. The reference is from the Glass book
(above). It sounds interesting, though. If somebody has a pointer to it, please
let me know. I'd like to read the original.
Software Verification and Validation for Practitioners and Managers, Second Edition
by Steven R. Rakitin
Find it on amazon.comhere.
My review of that work appears on that page as well.
Peopleware : Productive Projects and Teams, 2nd Edition by
Tom DeMarco & Timothy Lister
Find it on amazon.comhere.
The Psychology of Computer Programming Silver Anniversary Edition by
Gerald M. Weinberg.
Find it on amazon.comhere.
My review for the work may also be found on that page. Weinberg explore
what does, and does not, motivate software professionals which, of course,
is directly related to this.
First, since we're discussing writing, a nitpick: "Begs the question" does not mean "Prompts one to ask."
Second, most projects I've worked on had copius amounts of documentation. Mostly it concerned what a customer was expecting, if and when various features would be included, and requirements for interacting with other, existing systems. But, sadly, the docs for the code itself were often scant and, increasingly, wrong. Tools such as JavaDoc have helped, but rely on developers writing and maintaining comments, a chore treated by many with distain.
There's a cultural barrier that presents writing documentaion as less than noble. *Real* hackers will simply tell you to read the code, but the code can only tell you what the program does, not what it should be doing, or why.
I think documentation is underrated, not just because of its ability to make future maintenance easier, but because it is an important part of the design process. It can be a time of reflection, when you consider what you've done and describe what that means. Underdocumented interfaces and projects tend, in my experience, to be poorly designed. The interfaces are often too bulky -- both programming and graphical interfaces. They don't predict actual usage, they support theoretical needs that don't actually occur, or they lack symmetry and orthogonality in their design.
But insofar as documentation is useful as a process more than a product, I think XP does attempt to cover this in some of their ideas with use cases -- producing ephemeral documentation. This may or may not be sufficient.
Other important areas of documentation are generally encountered when the underlying technology is flawed in some fashion. Perhaps there's no way to persistently annotate the source, or the structure of the source is forced into a restricted and unexpressive form. Or the source is simply opaque -- usually a persistent structure besides plain text.
Otherwise, good documentation can successfully take the form of careful comments, good naming conventions and good names, and localized documentation like JavaDOC which helps programmers learn where to start reading the code. Of course, all of these are documentation efforts that are also frequently ignored. Poorly factored code is common, and it can be difficult to justify fixing -- unlike external documentation, it is not well understood or measured by non-programmers.
> the code can only tell you what the program does, not what it should be doing, or why.
Well factored code *can* tell you what it should be doing and why. It can be difficult, and not always possible for all kinds of code and environments, but it is not without hope. Names of variables, objects, and functions all express intention. And like comments, they can also obscure intention and mislead.
> Program maintenance constitutes 40-80% of software costs, > yet I recall > few discussions during program planning or development > about such things.
One of the nice things I've found about working with more agile methodologies is that they address this straight on.
For example if you're doing XP then every release after the first iteration is a maintenance release where you're adding new features.
> If you have > no <b>usable</b> > documentation then all of the information is in people's > heads. If the > heads walk out the door (turnover) then the information > needs to be > rediscovered. That is not cheap.
An alternative to more documentation is to get the information into more heads. Practices like common code ownership and pair programming can help there.
> The tendency is to dismiss anything outside the > code as > "out-of-sync with reality" whether, objectively, that is > true or > not.
True. However the dismissal tendency exists because of the common experience of documentation being radically out of sync with the codebase.
If it has been my experience that, nine times out of ten, the documentation isn't going to help then instant dismissal is my best approach.
If only there was a way of automagically finding out whether documentation was good or bad :-)
> Finally, there is a notion that the code is golden (at > least > documents itself) even if no other external document is > present > purporting to do so. Tools like JavaDOC, which scrape the > Java > source code of your project and create a hierarchy of web > pages, > can create the documentation at the push of a button. The > bits > will be new and fresh but are they <i>right</i>? That is, > how > is it more likely that the comments describing the inner > workings > of a particular subsystem are more accurate, descriptive, > and insightful > just because they were pulled from the Java code? Put > another way, > does source code and its associated comments ever get > "out-of-sync"? > Of course it can. At best the co-location of the > documentation and the > code can eliminate the need for "finding the right place > to update the > documentation", but it isn't a panacea. It still takes > work, and > discipline, to ensure the documentation is correct.
Comments can get out of sync. They are just documentation in another place.
However, by definition, the code cannot get out of sync. The code describes accurately what the program currently does.
Now there may be an argument that separate documentation describes what the codebase should do, as opposed to what it actually does, but that's a different kettle of fish.
I prefer to spend time getting the code to match the requirements (using, for example, automated acceptance tests) than spend time documenting design decisions and then having to keep them in sync with the code. As ever YMMV :-)
Although you don't mention them explicitly I hope you're including tests when you talk about the code, since they are often better at documenting questions about why the code does things in a certain way than the code itself
>The Extreme Programming crew has made their choice: > they don't pretend > to create such artifacts. (Castigate them if you must, but > at least they > are honest about what they do.)
XP myth alert! XP people don't create documentation until it is necessary - very different statement from never creating documentation artefacts at all.
It's just that we find that documentation is necessary in fewer places than many people think.
> but the code can only > tell you what the program does, not what it should be > doing, or why.
Depends on the code ;-)
It has been my experience that a lot of what the program should be doing and why can be placed in the code if you refactor well and use good naming practices.
You can, of course, argue that bad code doesn't perform this task - but neither does bad documentation.
Good acceptance tests and unit tests act as very good descriptions of what the could should do - and have the advantage of an immediate list of failures if they become out of sync with the code.
In my experience there are very few things that you can't make explicit in the code with some effort. "Global" requirements like "error rate of new users must be less than 60%" or "all transactions must complete in under 15 seconds" are hard - most everything else isn't.
I have a hard time believing that any team could create a substantial product with no documentation at all. That would be like building a 50-storey office tower with no blueprints. You might not need sophisticated theory of operations documentation, but there is far more to building a complex product than any humans I have ever met could keep in their heads.
Start off with the detailed analysis of what the customers need all the many customers of a product from CEO to data entry clerk. Go on to the technical implications of resource load, existing infrastructure, competition, existing skills, and expected business direction. Work out all the myriad details of designing a product that is efficient, reliable, robust, maintainable, flexible, and powerful.
When the product is complete, there had better be documentation where its needed which is usually with the source code, not in a separate hard-copy book that can easily be mislaid. That isnt to say that documentation is ever a substitute for a good, clean, usable design. If software cant be used without an accompanying tome, then it probably should scare a sensible user.
Im amused by JavaDoc. The idea that you could abstract out source code comments into a separate file and use them as product documentation was commonplace at one time. It didnt work then, and it doesnt work now, because quality programmers write good documentation and dont need such utilities and hackers (i.e. bad programmers) cant write good comments any more than they can write good documentation.
Here is my little theory why documentation gets out of sync: 1. Programmer A sits alone and writes a document. 2. Programmer B needs to change the described code. 3. Programmer B reads A's documentation, which at some point turns out to be unclear or just a little bit outdated.
The "right" way of handling this would be to correct the documentation first, looking at the original code, then update both. However, programmer B does not feel an expert like A - who was the original author of the code and docs. Therefore, she has several choices: 1) "fix" the documentation and possibly make some mistakes 2) contact A and ask for help, in hope that she still knows better 3) just let it be ("for now") and change the code. The third alternative requires the least responsibility and is the fastest one to implement. Once it is chosen, the documentation becomes even more outdated, strenghtening the repeated choice of this option on any later occasions.
Unlike buggy code, wrong documentation does not cause immediately apparent failures. I'd also argue that writing excellent documentation is more of an art than writing working code. Excellent documentation can speed up solving real problems. Bad documentation can mislead, raise questions, annoy or bore its reader. No documentation is better than bad documentation. A very easy choice, indeed.
Why is most documentation inadequate? Because we don't test it at all. Documentation, like any writing, is about communication. However, at the time when it is written down there is usually noone to assess its understandability and usefulness, just a lone expert spilling her current thoughts on paper.
Could'nt agree more on the point that developers are not the best of "novelists". However I still feel that documenting what you design is aworthwhile effort.Especially in the world of dynamic changes you may be in a distributed environment where documentation and communication are the best tools you have. Adding to this UML is an excellent medium to communicate technical mumbo-jumbo in a language independent manner. With IT companies in perpetual race for "CMM" Level 1,2,3,4 and so on, documentation is only going to be more if not less.
The 60/60 rule: 60 percent of software's dollar is spent on maintenance, and 60 percent of that maintenance is enhancement. Enhancing old software is, therefore, a big deal.
Ada has demonstrated over 20 years that programming language design can halve lifecycle costs. Ada encourages its users to spend more time in writing to describe their code. That extra effort communicates information to the tools and to all future readers of that code. Future readers derive the benefits.
> However, by definition, the code cannot get out of sync.
What is true is that code is never out of sync with what it does. But it does get out of sync with what it should be doing. Code that gets upgraded, loses functionality and is extended will do things because that was the voodoo that fixed the bug. The subsequent maintenance programmer will then read the code and become very confused and the pattern named Lava flow comes into being.
Without the information about what the code is supposed to do and why as opposed to what the code actually doing then refactoring becomes a very dangerous exercise.
> The code describes accurately what the program currently does.
If one couples documentation to semantics, then one cannot let the documentation get "out of date" because your system is no longer correct - it cannot be compiled, tested, or verified.
For our Java-based systems, we use the Java Modeling Language (JML), a behavior interface specification language (BISL) for Java.
With JML you write contracts and models for you software using Java plus a set of constructs specific to writing specifications (e.g., pre-/postconditions, invariants, model variables, frame axioms, etc.). This means that the programmer can use a language that has a Java syntax and semantics (no learning some new ambiguous or weak language like UML, OCL, or something arcane like VDM or Z) to write testable, checkable, formal documentation.
The various tools that understand JML (of which there are around 10) automatically transform your JML-annotated Java programs into (a) formal documentation, (b) unit test code, (c) generate invariants for you, (d) generate verification conditions, (e) automatically statically verify some conditions, etc.
These technologies are being used in industry and academic settings with great success. We at KindSoftware are also applying these general principles to other systems (e.g., Eiffel, ML, etc.).
I strongly suggest that Java developers and managers who care about the correctness and clarity of their systems spend a bit of time and look into JML.
> What is true is that code is never out of sync with what > it does. But it does get out of sync with what it should > be doing. Code that gets upgraded, loses functionality and > is extended will do things because that was the voodoo > that fixed the bug. The subsequent maintenance programmer > will then read the code and become very confused and the > pattern named Lava flow comes into being.
SOOO true. Here, we emphasize writing code that "documents itself" through the use of variable names that make sense and commenting only the stuff that isn't immediately obvious.
Now that we have a project that is reusing code from a previous one, this technique has made integrating old code into the new system a lot easier.
Lava flow is a very good phrase to describe what happens when code isn't documented or even written properly. Some programmers at our work believed that by making poorly documented code they could ensure job security by being the only one who understood how something worked. Those guys were fired and their code rewritten...
The problem, in my experience, with documentation is that is normally only specifies what the code does, while it should specify why the code does what it does.
Any competent developer can see from the code what is done, but it is not possible to derive (at least not without a lot of effort and re-discovering of what was discovered in the original project) why the system was implemented the way it ended up.
> Any competent developer can see from the code what is > done, but it is not possible to derive (at least not > without a lot of effort and re-discovering of what was > discovered in the original project) why the system was > implemented the way it ended up.
That is exactly why I evolved my diary-driven software process with the minimal requirement being to document decisions - the alternatives considered and why a particular approach was used. http://www.oofile.com.au/adsother/sse.html
Note that is documentation which as it is historical, can never go out of date - at the time a decision was taken, you can see what was considered and what adopted. If the design has later varied from that, hopefully there's another diary entry. Even if there isn't an explanation as to why it has changed, you at least have a checkpoint on the history of the design.