On Feb 11, 2003 Reasoning
published a study comparing the Linux TCP/IP stack to commercially developed
TCP/IP stacks. This comparison showed that an active, mature Open Source
project may have fewer defects than similar commercial projects (the study is
available for download at URL http://www.reasoning.com/downloads/opensource.html).
The report generated
numerous requests for more information about Open Source development and how it
compares to commercial development. In response to those inquiries, Reasoning
inspected an Open Source project in an earlier phase of the development
lifecycle. The selection was based on the following criteria, chosen because
they represent the most appropriate comparison with the companys existing
knowledgebase of code inspections:
1.
An active Open Source
community, which shows that there are stakeholders in the development effort
2.
Code in development, which allows us to see the state
of code earlier in the lifecycle
3.
Stable code base (i.e. 2.x or 3.x release), to avoid
the vagaries of a starter project
4.
Usage within the software industry, to ensure that
there is pressure from the customer community
Based on these criteria, the
Apache Web Server was a natural selection. This is the most widely used Open
Source web server available, and is essential to many commercial web based
applications running today. The code that was inspected is included in Apache http
server 2.1, dated January 31, 2003. This specific version represents active
development of approximately two months.
Automated Software Inspection
Software inspection the
process of examining source code to identify defects is a standard practice in
development organizations and is widely recognized as the best way to find
defects. Inspection is hardware-independent, does not require a runable
application or a suite of test cases, and does not affect code size or
execution speed. The majority of code inspections are performed manually.
Theoretically, the greatest number of defects can be uncovered when a developer
reads through the code line by line. However, this process is slow,
painstaking, and fraught with inconsistency and does not scale to handle the
growing number of multimillion-line applications.
As a code base grows, the
cost of a complete manual inspection becomes prohibitive, and the volume of
code is intimidating to developers. As a result, manual inspections are only
performed on subsets of the source code.
Inspection tools are
available, but are capable of performing only a portion of the inspection
process, requiring significant further manual review. These tools generate a
large volume of defect warning messages, but many of them are false positives
the inspection tool thinks that it has found a defect, but a deeper manual
analysis of the context shows that it has not. This false positive problem is
so severe that the rate frequently exceeds 50 false positives for each true
positive. In other words, only two percent of warning messages represent actual
defects.
Automated software
inspection (ASI) services provide many of the benefits of a manual code review
in significantly less time, and at a dramatically lower cost, than manual
inspection or internal use of inspection tools. It is typically conducted as an
outsourced service, which prevents the diversion of in-house development
resources from current development projects.
Automated software
inspection typically identifies defects that cause application crashes and data
corruption, and provides actionable reports for the development team. With the
expertise gained by running and analyzing the output of automated tools on a
daily basis, ASI is able to mitigate the problem of false positives.
The results of automated
code inspection are reports that:
·
Make defect removal
fast and simple by identifying the location and describing the circumstances
under which the defects will occur
·
Identify the parts of
the code with the greatest risk, enabling the development organization to focus
QA and testing resources where they are most needed
·
Compare code quality
with a benchmark
Apache Inspection Results
The inspected Apache code
base consisted of 58,944 lines of source code (not including white spaces or
comment lines). In total, we found 31 code defects. A key reliability indicator
is the defect density, which is defined as the number of defects per thousand
lines of source code. The defect density of the Apache code inspected comes to
0.53 per thousand lines of source code (KLSC). In comparing this defect density
to the commercial applications we have inspected, we find that the Apache code
has a defect density very similar to the average defect density we have found
within commercial applications.
Based on our experience,
Apache 2.1 is starting at a similar defect level as commercial code. This is
not unexpected. Development teams working on Open Source are likely to have the
same experience level as commercial projects. This is particularly true with
projects that are widely used within industry; since they will often include
input from those same commercial developers.
Another component of our
analysis is the breakdown of defects by defect class and insights that can be
drawn from this information. We divide the defects found into five specific
categories:
·
Memory leaks: 0
·
NULL pointer
dereference: 29
·
Bad Deallocations: 0
·
Out of Bounds Array
Access: 0
·
Unintialized Variable: 2
Defects found in the Apache
code are consistent with new development, as seen by the relatively high
percentage of Null Pointer Dereferences. In a different application, we would
expect to see some memory leaks and out of bounds array accesses. However,
given the nature of Apache, these two categories have likely received far more
attention. In particular, software that has not gone through as rigorous a
security cleaning is much more likely to have out of bounds array access
(since buffer overflows are a subset of these). The difference in defect
distribution, though interesting, is just a natural tradeoff based on the
application functionality.
Unlike our findings in the
Linux TCP/IP inspection, Apache 2.1 does not show an appreciable difference in
code quality (as measured by defect density) when compared to commercial
development.
Observation
Given that this code has a
similar defect density as commercial code, what can be learned about the
differences between Open Source and commercial development? From the previous Open Source inspection,
Reasoning determined that the Linux TCP/IP stack showed a lower number of
defects than the majority of the commercial TCP/IP stacks inspected. This was a
comparison of very mature software and, on average, showed Open Source to be
superior. However, it only looked at the result of many years of development.
Because the Apache code is relatively early in its development lifecycle, this
current inspection allows an approximation of the initial defect density
achieved by Open Source. Though more data will be needed to prove out a
conclusion, when the Open Source and commercial data points are plotted, we can
make an approximation of the change in defect density over time.
Given the limited data,
Reasoning sees Open Source as being faster, on average, than commercial efforts
at removing defects from software. This is not as expected, since commercial
software companies often invest considerably in testing tools and time in order
to meet reliability requirements demanded by their customers.
It should be noted that Open Source can end up with fewer
defects. Because the new evidence shows that both development environments are
likely to start with a similar number of defects, the core of the difference
must be after development starts. In that time period, the main difference is
the number of developers actually looking at the code. Commercial software
companies are more likely to have sophisticated tools; however, they are
unlikely to have achieved the same level of peer review that Open Source code
can achieve, which is critical in finding defects.
This is certainly not a new concept. Software engineering
leaders such as Michael Fagan and Capers Jones have pointed out the advantages
of software inspection. The advantage of the inspection process to identify
and remove defects is likely to be heightened in Open Source projects, since
the reviewers are often quite independent from the contributors. This provides
the autonomous nature that Fagans review guidelines are aimed at. While it may
be difficult within a company, it is a natural consequence of Open Source.
It has been clearly demonstrated that incorporating
inspection into the software development process results in significant
reductions in defect rates. Capers Jones data shows a six-fold acceleration in
defect removal when inspection is employed in conjunction with testing.
The greatest challenge for development organizations is that
the amount of effort needed to manually review code and manage the process to
guarantee review independence often leads to abandonment of the effort. Or
worse, inspections become a virtual rubber stamp.
Considerations
As earlier stated, Reasoning has chosen Open Source projects
to best compare against commercial development. Clearly the data presented here
cannot be extrapolated to all Open Source efforts, or to all commercial
efforts. Other Open Source projects may or may not show the same improvement in
defect densities as they mature. Key drivers in this are:
·
The number of active developers reviewing the project
·
Maturity level since the last major release (i.e. later dot
releases)
·
Strong customer use
However, when selecting an Open
Source (or any third party) product, we recommend a thorough analysis of its
reliability.
This comparison relies on the
assumption that the Apache 2.1 project will attain the same defect densities
found in the mature Linux TCP stack. Though this is a reasonable assumption, it
will require future study to verify.