Python Buzz Forum - Combinitorial Library Generation with SMILES

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Python Buzz Forum
Combinitorial Library Generation with SMILES

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Andrew Dalke

Posts: 291
Nickname: dalke
Registered: Sep, 2003

Andrew Dalke is a consultant and software developer in computational chemistry and biology.

Combinitorial Library Generation with SMILES

Posted: Dec 13, 2004 12:14 AM

This post originated from an RSS feed registered with Python Buzz by Andrew Dalke.
Original Post: Combinitorial Library Generation with SMILES Feed Title: Andrew Dalke's writings Feed URL: http://www.dalkescientific.com/writings/diary/diary-rss.xml Feed Description: Writings from the software side of bioinformatics and chemical informatics, with a heaping of Python thrown in for good measure.	Latest Python Buzz Posts Latest Python Buzz Posts by Andrew Dalke Latest Posts From Andrew Dalke's writings

Someone recently asked me how to generate a combinitorial library given a set of fragments.

For the non-chemist readers, combinitorial chemistry uses a core structure and reactions that can attach fragments at a given point to the core. This lets chemists search a structure family to find a compound that's is "better" in a chemistry space with dimensions including effectiveness, toxicity, digestability, and ability to reach the right part of the body. (This is for pharmaceutical chemistry; combinitorial chemistry can be used for other domains.)

A core may have 1, 2, or more fragment attachment points so many new compounds can be created with this technique. Companies use robots to generate the new compounds and test them against the target, which might be a protein or cell. There can be well over 100,000 tests in an assay. I've worked with a couple companies to develop tools that help the scientists better understand these sorts of data sets.

To limit the number of compounds created, many people will generate virtual libraries and use software to pick the compounds that will be tested via the robots. If the software was good we wouldn't need the robots. We've a long way to go.

The email asked if any software is available to generate the virtual libraries. He had been using SMILES strings for the core and fragments and simply concatenating them together. This doesn't work because that allows at most two attachment points on the core. One for the front and one for the back of the SMILES string.

The easiest way to do this is with ring closures. Suppose the core structure is O1CNCCC1 with attachment points on the 3nd and 5th atoms (the N and the third C) shown in bold. Pick very high ring closure numbers not seen in real life, like 90 and 91 and add them to the appropriate atoms. The '%' is needed in SMILES for closure numbers greater than 9.

The result is O1CN%90CC%91C1.

Use the same sort of trick to label the fragments. Suppose a fragment is OC=CC=C- and the terminal carbon (the "C-") is to be attached to the nitrogen. The ring closure number for the N is 90 so label the terminal carbon the same, as OC=CC=C%90. To make it easier on me, assume a methyl is attached at the core's C attachment point labeled 91. The corresponding fragment in SMILES is C%91.

To make it all work, concatenate the three strings using the dot disconnect character. The result is

 O1CN%90CC%91C1.OC=CC=C%90.C%91

That's all that's required. When the SMILES parses puts the molecule together it matches the two %90 and the two %91 ring closures to stitch the three parts together.

The dot disconnect only says there isn't an implicit bond between the atoms on either side of it. It doesn't mean that the two atoms can't be covalently bonded through ring closures or must be parts of different connected subgraphs. (That's another way of saying "covalent bonded molecules")

The same fragment library might be used for two different fragment points. Because the '%' character only occurs in SMILES before a two digit ring closure you can label all your fragment terminals with, say, "%99" and use simple text substitution as needed for the given core attachement point.

Make sure the bond types match across the ring closure. C%1.C%1 and C%1.C-%1 are the same as CC and C=%1.C1 is the same as CC, but C=%1.C-%1 is illegal because the two explicit bond types conflict. You'll need to be even more careful with chiral bonds to make sure the order of the core and fragments is correct.

It's very cool that a text editor and a couple shell commands are all that's needed to make a virtual library using SMILES.

Read: Combinitorial Library Generation with SMILES

Previous Topic

Next Topic


	Web Artima.com