This post originated from an RSS feed registered with Python Buzz
by Andrew Dalke.
Original Post: PyDaylight 1.0 released
Feed Title: Andrew Dalke's writings
Feed URL: http://www.dalkescientific.com/writings/diary/diary-rss.xml
Feed Description: Writings from the software side of bioinformatics and chemical informatics, with a heaping of Python thrown in for good measure.
I just updated PyDaylight with
support for v4.91 of the Daylight toolkit. It ships with
backwards-compatible support for v4.8x. I tested it under with all 4
toolkit versions under Linux, Solaris and IRIX, using Python 2.4.
Thanks to Daylight for letting me use their machines for the upgrade
and testing.
The new version is called 1.0. I was about to call this version 0.91.
The previous ones were 0.9, 0.85, 0.8, and so on. I was holding off
on using the 1.0 name until someone tested the Thor and Merlin
support, but as most people are migrating to the Daylight Cartridge
that isn't really important. PyDaylight supports pretty much all of
the toolkit and its core is now some 7 years old so it's time to
commit and not hide behind the "still under development" numbering.
The upgrade started with me compiling Python 2.4 for Daylight's
machines. I've tried for years but they are a C shop and don't even
use Python for in-house use, so their machines had some rather old
Python versions, if it was even present. There weren't any problems
with the builds though I didn't run the regression self-tests. (Once
upon a time the SGI optimizer couldn't handle the regular expression
module.)
PyDaylight uses a modified version of dayswig to build the C extension
for Python. This uses SWIG, which also wasn't installed on the
Daylight machines. NOTE: the distribution includes pre-swig'ed files
for versions 4.8x and 4.91 of the toolkit so you likely don't need
SWIG on your machines.
I had to change dayswig slightly to support DX_API_PUBLIC in the
function signature. This is __stdcall when the toolkit is
compiled for MS Windows, empty on Unix machines. Daylight no longer
supports the monomer toolkit so dayswig only includes that interface
if dt_monomer.h exists under $DY_ROOT/include. This
is perhaps doing too much work for myself since after all no one I
know uses the monomer toolkit - that's why Daylight's no longer
supporting it!
The latest SWIG release is 1.3.25 but that didn't work quite right.
It didn't like the wrapped Daylight handles. The SWIGged interface
now requires that the objects be derived from Python integers, when
once upon a time it did coercion via int(obj). A nice
feature of PyDaylight is that its objects integer handles are
intermixable, and much of the code depends on that flexibility.
After digging for a while I couldn't find an easy fix so I did what I
should have done a couple hours previous and used an older version of
SWIG. For the 0.9 release I used 1.3.11 which is still available.
Switched to that and a few minutes later dayswig_python worked as
expected.
Next I make sure that each of the new functions was supported. I
wanted the PyDaylight code (on top of the dayswig_python level) to
support 4.8x and 4.9x transparently so I created a new internal
variable, daylight._toolkit_version which can have integer values like
4830 (for 4.83) and 4910 (for 4.91). I decided to make a new
variable, different than but based on the value of DX_TOOLKIT_VERSION,
because then I could control its value and ensure it was appropriately
comparable.
The README for the release lists the API features. The functions
dt_molgraph(3),
dt_addh(3),
and dt_suppressh(3)
are available as the new Molecule and Reaction methods
molgraph(), addh(), suppressh(). I added a
default so the last two are applied to all atoms and not just chiral
ones.
In my testing I found there was a bug in dt_addh and reported
it to Jack, who tracked it down and fixed it. The bug looked
something like this, though the following comes from memory:
Apparently the count used to figure out the number of branches used
data fields which weren't set right for the newly created hydrogens.
The new dt_molgraph is kind of strange. It's the only
toolkit function which modifies a molecule but doesn't check the mod
flag. It always sets the mod bit, does its work, and turns the mod
bit off. I talked with Jack and I think that will be changed in a
future toolkit release so molgraph only works when the mod bit is set,
and where it doesn't set the bit. Watch the release notes!
:)
As a side effect of my testing I found a fun new way to wreak havoc on
the toolkit. Consider this
>>> from daylight import Smiles, Bond
>>> mol = Smiles.smilin("C"*200)
>>> atoms = mol.atoms
>>> for atom in atoms[2:3+N]:
... Bond.add(atoms[0], atom)
...
>>> mol.mod = False
>>> mol.cansmiles()
There are two failure modes. If N == 100 (so there are 100 bonds to
the first atom) then the toolkit hangs in the cansmiles() code. It's
most likely looking for an available ring closure number, but none are
available. If N == 128 then dt_mod_off fails. It looks like
there's an internal table that expects no more than 128 bonds for an
atom. Interestingly, if dt_mod_off fails then as a side
effect it deletes the molecule or reaction object. PyDaylight doesn't
catch that error condition so keeps the now dead handle around.
Future use of the object will have strange side effects and the
garbage collection will say something about an uncaught exception
because the dt_dealloc fails.
The dt_smilin_addh(3)
is available as daylight.Smiles.smilin_addh(). SMILES errors
are now appended to the regular error queue instead of the special
SMILES error queue (finally!) so I tweaked the code used to get the
last error so it does the right thing. There's still a toolkit bug
where some SMILES errors don't cause an error. The one example I
found was ">". Jack's going to look into it.
There's a new fingerprint function in v4.91, dt_fp_similarity(3).
This takes two fingerprint handles and an expression(5).
The expression strings look like "c/sqrt((a+c)*(b+c))" where
the variables a, b, c, and d, are
the number of bits only on in fp1, only on in fp2, on in to both, and
off in both, respectively. There are a few functions like sqrt(),
min() and max() and the normal operators "+-/*^". Constants can be
written as integers or simple floats (exponential notation like
2.3E-09 isn't supported). The internal evaluation uses doubles but
the result is return via a dt_Real which is only a float. A
float seems rather small these days, since it only has 6 base-10
digits of precision.
The new PyDaylight function for this is
Fingerprint.similarity. The Daylight function supports a few
hard-coded expression strings like "COSINE" and
"TANIMOTO". These are tested via exact string matches and
cannot be used as variables in the expressions. If you want the
latter I copied the definitions from the documentation into the table
Fingerprint.expressions.
One problem I have with dt_fp_similarity is that I can't tell
if there was a syntax error in the expression. The function will
return -1.0 in that case but it's possible that the expression is
supposed to return -1.0. There are a couple of other places where
it's hard to tell if a return value is an error indicator or not, but
in those cases I can check if there's a new message in the error
queue. Not here. The error message goes to the terminal and not to
the error queue. I hope they fix this for the next release.
I suppose you could implement your own parser in Python. Shouldn't be
too hard, either using eval (despite the potential security problems)
or use the parser generator included with PyDaylight for the MCL
support.
The new function dt_ischiral(3)
test whether or not the given atom is chiral. This is available as a
new read-only attribute of Atom instances named ischiral.
>>> mol = Smiles.smilin("OC(Cl)=[C@]=C(C)F")
>>> [atom.ischiral for atom in mol.atoms]
[0, 0, 0, 1, 0, 0, 0]
>>>
The new function dt_setbondstyle(3)
is available as the setbondstyle method of Depiction
instances.
I moved some of the module self-tests into the test/
directory and created a new test file test_v491.py for each
of the new features. You can look at that file for examples of use.
Finally, there was a bug fix to handle a case found by Terry Brunck.
I didn't deallocate temporary streams for looping over atoms.
Normally that's okay because the toolkit deallocates those when the
molecule is deallocated. But his algorithm made thousands of streams
and the Daylight deallocator used a garbage collection algorithm which
doesn't scale well for that case. The fix was to delete the temporary
stream once it's no longer needed. There was even a comment in the
code questioning why there wasn't a dealloc.
"Daylight", "Daylight toolkit", "Thor" and "Merlin" are registered
trademarks of Daylight Chemical Information Systems, Inc. Daylight
C.I.S. is neither affiliated with nor responsible for PyDaylight. Are
you kidding? They don't think that Python can be used for real
programming. (Hi Daylight krewe! :)