The Artima Developer Community
Sponsored Link

Python Buzz Forum
Paper databases -- History of Chemical Nomenclature

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Andrew Dalke

Posts: 291
Nickname: dalke
Registered: Sep, 2003

Andrew Dalke is a consultant and software developer in computational chemistry and biology.
Paper databases -- History of Chemical Nomenclature Posted: Oct 14, 2003 8:20 AM
Reply to this message Reply

This post originated from an RSS feed registered with Python Buzz by Andrew Dalke.
Original Post: Paper databases -- History of Chemical Nomenclature
Feed Title: Andrew Dalke's writings
Feed URL: http://www.dalkescientific.com/writings/diary/diary-rss.xml
Feed Description: Writings from the software side of bioinformatics and chemical informatics, with a heaping of Python thrown in for good measure.
Latest Python Buzz Posts
Latest Python Buzz Posts by Andrew Dalke
Latest Posts From Andrew Dalke's writings

Advertisement
The Geneva Congress of 1892 defined a nomenclature for international use. This was a formalization of systems already in use. One of the most important was Beilstein ("Beilstein Handbook of Organic Chemistry"), first published in 1881. It was used by chemists who wanted to find more information about a compound or related compounds. I am not a chemist and I've thankfully never had to use Beilstein but I think I've figured it out enough to give a sense of what life was like before computers. If you want a real guide, with more details and even helpful pictures, try other sites.

(The closest I've come to something like Beilstein looking through Gradshteyn and Ryzhik or Abramowitz and Stegun for the solution to a math equation.)

In modern day speak Beilstein is a database of chemical records. Each record entry has information about a compound, including its name, molecular formula, and a relevant publication reference, and possibly a depiction and physical properties like boiling point. All in German because Germany dominated the field of organic chemistry in the 1800s.

The entries are sorted by structural type into volumes (and subvolumes, with new volumes added over time). The acyclic compounds are in volumes 1, 2, 3, and 4. Acyclic ompounds with no functional group are in volume 1 as are hydroxy-, oxo-, and hydroxy-oxo compounds. Acyclic carboxylic acids are in volume 2 unless they also have hydroxy- and oxo-functions, in which case they are in volume 3. And so on. This ordering gives a way for chemists to browse for other structually similar compounds with similar function.

If the systematic name is known, use the General-Sachregister (name index), which maps from name to record location (volume, subvolume, page number).

If the systematic name isn't known for a compound, first determine its molecular formula in Hill order. Go to the General-Formelregister (formula index) of Beilstein for a list of compounds with that formula. Es gibt viele Verbindungen mit ... Sorry, got carried away trying to remember enough college German to read some of the Beilstein examples. There can be many compounds with the same molecular formula. Even something simple like C2H6O could be either ethanol or dimethyl ether. All the molecular formula does is greatly reduce the number of compounds to consider. It can be reduced even more by using knowledge of German chemistry nomenclature (perhaps with the help of a handy German/English chemistry dictionary) to figure out which of those compound names are most likely to correspond to the structure.

Here's where using a line notation really pays off. There's generally about 60 lines per page. From pictures I've seen, structure formulas even when compressed for space look like they take up about 5 lines of text. To display the structure in the index requires at lest quintupling the number of pages used for the index. Since a record itself is only about 5 lines long, it would mean doubling the number of an already large publication.

If that fails, the compound might still be in Beilstein. Some compounds aren't listed in the formula index but can found by a combination of a structure-based decision and leafing through pages. For an example of the joys of a search, take a look at this page where it describes looking for the aluminum salt of 8-hydroxyquinoline (a laser dye).

It isn't listed in the formula index so the way to find it is to use knowledge of how record entries are laid out. The dye is a heterocyclic system with one nitrogen, so should be found in volume 21. That contains information about to get to the correct subvolume, which lists the start page for compounds of the form CnH2n-11NO and more specifically the start page for C9H7NO. From that page, manually search starting from page 1057 until it's found on page 1144. Yowza!

There are other chemistry indicies like CAS, which indexes journal publications and uses their own nomenclature and search system. Given the last few essays, you should now be as competant as I at understanding summaries of how they work.

That describes how to use Beilstein, and should provide clues as to how the database was generated. While I don't know if this is what they did, this is my best guess. There's a team of chemists trained in the nomenclature system (it took about three years to become an adept). They read the raw sources (books, journals, etc.), convert the information to a structure diagram and apply the nomenclature to get the systemic name. They then searched their notecards to see if they knew about it already; updating the cards if they did. If not, they create new cards; one for the record, one by name, and one by formula. When it's time to publish they went through the cards and created the printing plates, which were used to print the book. Very labor intensive, but that was state of the art. There were only minor improvements in the process, like improved printing press technology which made it easier to include depictions, until the 1940s.

Read: Paper databases -- History of Chemical Nomenclature

Topic: C++ tip: always use std::list, not std::deque or std::vector, if you like storing pointers or... Previous Topic   Next Topic Topic: Another reason

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use