This post originated from an RSS feed registered with Python Buzz
by Andrew Dalke.
Original Post: Open source != peer review
Feed Title: Andrew Dalke's writings
Feed URL: http://www.dalkescientific.com/writings/diary/diary-rss.xml
Feed Description: Writings from the software side of bioinformatics and chemical informatics, with a heaping of Python thrown in for good measure.
I gave two presentations at the German Conference
on Chemoinformatics (GCC) in Goslar, German. One was an update of
my EuroSciPy presentation, Python
Tools in Computational Chemistry (and Biology). I included more on
the history of Python and why I think it became widely used in
cheminformatics. At the end I gave some ideas of what I want for the
future. I'll elaborate more on that in another posting here.
The second presentation was about some of the difficulties I've seen
in doing open source cheminformatics development. I tried a different
presentation style: black background with one or only a few words on
the slide in white font. It required more practice, but came up pretty
nicely I think. I started by writing down everthing I wanted to
say. I'll post that text here soon.
Open Source != peer review
One of the slides is titled "Open Source != peer review". I'm breaking
that out because it's something I
want feedback on, or at least arguments opposing. Here's the short
version of that part:
Some argue that doing good computational-based science requires open
source. The argument is that scientists need to review the source code
in order to verify that it works correctly. How, they argue, can you
review someone else's paper if you can't review the source code used
to make that paper?
I like open source. (My talk goes into the philosophical differences
between "open source" and "free software.") I think there should be
support for peer review. But I don't understand why the ability to see
the source code, in order to review it for scientific quality,
requires the right to redistribute the source code to others.
CHARMm
I gave CHARMm as an example. It's a molecular dynamics program from
the Karplus group at Harvard. The academic license costs $600 and all
licensees get the source code. People can review the source code, and
modify, and I believe even send modifications to others who have a
CHARMm license. But it's not open source.
What additional peer review could be done on CHARMm if it was
distributed with an open source license? Please bear in mind that the
time needed to get up to speed on CHARMm is quite large, and using it
for interesting science likely requires good hardware, so $600 isn't
that much. The license fees go to supporting CHARMm development, so
there are scientific benefits to having the fee.
Yes, I know that free software allows selling the software so it's
possible to charge the $600 even for open source code. My talk goes
into the difficulties of actually doing that.
My point here is to get feedback about why the right to redistribute
software is a requirement for effective peer review, and to tie it
down to specific examples. Mine is just one; feel free to use your
own.
Clean room development
Once you've done that, please also explain how to solve this
problem. Suppose I review someone else's source code, either as part
of the peer-review process for a publication or because I want to
verify that code I got really does work.
Several months later I write a program which has similar functionality
and my implementation looks very much like the code in the first
program. Perhaps I deliberately structured it after the first program,
or perhaps that form just makes sense. Perhaps I forgot that I had
even seen the code in the first program. Perhaps many things.
The author of the first package finds out that my code is similar.
Suspiciously similar. Was there a copyright violation? A license
violation? What should I do? Change my license? Rewrite the source?
Apologize profusely? Claim there was no violation?
Further complications: what if the original source code was submitted
in a peer review article and I was a reviewer. If that paper was
rejected, so that the original source code was never actually
published, then what are the license terms of the software? Eg, it
might be "BSD upon publication" but it was never published.
(The peer review system has long figured out how to handle the problem
of idea transfer of rejected papers to reviewer, but this essay is
about copyright and licenses, not ideas.)
What if there are multiple sections of my code, each vaugely like code
in other projects. If I want to be safe and generous I could change my
license to match all of them, but even free software licenses can be
incompatible. What if I had actually recommended that code structure
to others at a conference but there's no paper trail showing the
history and we forgot?
The industry solution to this is clean room
design. One person or team reviews the code and describes how
things work through a specification document. That specification does
not contain copyrighted material. Another person or team, who never
saw the original source code, takes the specification and implements
it.
There's no way we can do full clean-room development in this
field. That would mean some people only read code and others only
write code, which rather is against the point of doing code review.
If we encourage peer review of source code, which I think we should
do, then how do we deal with this issue? How can I be sure that if I
review someone's program then in the future they won't accuse me of
taking their code and using it in violation of its license agreement?
Or what if I did directly take 20 lines of code, figuring it was too
small to count? What recourse do they have?
Most of my publicly available software is under the BSD license. Hence
if someone uses my software in violation of the license the fix is to
add a simple copyright statement to the source code. The violator need
make no other changes.