This post originated from an RSS feed registered with Python Buzz
by Titus Brown.
Original Post: 7 Dec 2004
Feed Title: Advogato diary for titus
Feed URL: http://advogato.org/person/titus/rss.xml
Feed Description: Advogato diary for titus
The only problem with troubleshooting is that
trouble sometimes shoots back.
-- Joe Zeff.
I've been noticing a fair amount of commentary on Python and Java lately:
I particularly enjoyed Bruce Eckel's take on Static vs Dynamic typing, and Phillip Eby's Python Is Not Java (and Java Is Not Python, either). Phillip Eby
makes the point that the Python and Java mindsets are quite different
when it comes to frameworks: Python programmers tend to develop the
structure out as they need it, while Java designers try to specify the
frameworks' structure first & then fill in with specific implementations.
Isn't this antithetical to the agile programming paradigm that's been
gaining popularity lately?
Jython does a nice job of
mingling Java libraries with Python coding; I think many of the
Python-native extension modules can be loaded directly by Jython,
too. Is this a possible solution to the question of static vs
dynamic typing -- build your software in a language like Jython,
and then slowly solidify it into Java?
I primarily do research programming, in which the specific goals of
the software are largely undefined & the flexibility of the code should
be one of the proximal design considerations, so I definitely prefer
the Python(/Perl/Ruby) mindset in day-to-day work. There is a
question in my mind, though, about where future bioinformatics
software efforts will aim: I doubt that the current
loosely-coupled/badly-specified project-specific protocols for genome
databases and service frameworks will last, so where next? We could
either start developing specifications (e.g. the distributed annotation system (DAS)
or MAGE)
or implementations (e.g. GMOD). If
the former, there will be a significant barrier to entry for new
projects, as they will need to spend time developing to the standard
and confirming adherence. (This is the primary reason why DAS is a
failure, I think.) If the latter, I predict a general tendency
towards complexity of internal design as different projects try to
cram all their needs into a single system. Either situation would be
bad.
My preference is for what I think is a middle ground: the development
of APIs around common tasks, in a variety of languages. The idea
would be to take protocols like DAS and provide fairly simple library
implementations that give you 90% of the needed functionality with 10%
of the code complexity (based on the well known 90%/10% rule ;).
The key is to make sure the implementations work well enough to do
something useful & are in enough languages that e.g. the lone
maverick Python/OCaml/Ruby programmer in the sea of Perl & Java
programmers wants to play as well (just as one example!).
At the moment there are few tasks generic enough to be encapsulated by
such an approach: the two that I can think of are annotation &
microarray data presentation. Annotation suffers from a general lack
of interoperability: not only does everyone have their own standards,
but features don't transfer well between standards. I hear microarray
data is the same, although I don't work with it much. It'd be
interesting to try to work around the ontology problems (do you
*really* want to define an ontology before getting your work done!?)
to produce a genuinely useful annotation UI that interoperates. I
don't see one out there that's usable by "mere" biologists, and I
think that's the right target audience...
Why not use, say, XML? Well, properly grokking XML is burdensome and
the whole process is pretty legalistic (lots of people yakking etc.).
Since the goal is to lower ease of entry I think it's important to
have some functioning libraries as soon as possible -- that way people
can get the thrill of having the code actually work. When the library
moves towards a standard, projects that are already functioning will
at least have some reason to move with that library...