This post originated from an RSS feed registered with Python Buzz
by Andrew Dalke.
Original Post: Paper databases -- History of Chemical Nomenclature
Feed Title: Andrew Dalke's writings
Feed URL: http://www.dalkescientific.com/writings/diary/diary-rss.xml
Feed Description: Writings from the software side of bioinformatics and chemical informatics, with a heaping of Python thrown in for good measure.
The Geneva Congress of 1892 defined a nomenclature for international
use. This was a formalization of systems already in use. One of the
most important was Beilstein ("Beilstein Handbook of Organic
Chemistry"), first published in 1881. It was used by chemists who
wanted to find more information about a compound or related compounds.
I am not a chemist and I've thankfully never had to use Beilstein
but I think I've figured it out enough to give a sense of
what life was like before computers. If you want a real guide,
with more details and even helpful pictures,
tryothersites.
In modern day speak Beilstein is a database of chemical records. Each
record entry
has information about a compound, including its name, molecular
formula, and a relevant publication reference, and possibly a depiction
and physical properties like boiling point. All in German because
Germany dominated the field of organic chemistry in the 1800s.
The entries are sorted by structural type into volumes (and
subvolumes, with new volumes added over time). The acyclic compounds
are in volumes 1, 2, 3, and 4. Acyclic ompounds with no functional
group are in volume 1 as are hydroxy-, oxo-, and hydroxy-oxo
compounds. Acyclic carboxylic acids are in volume 2 unless they also
have hydroxy- and oxo-functions, in which case they are in volume 3.
And
so on. This ordering gives a way for chemists to browse for other
structually similar compounds with similar function.
If the systematic name is known, use the General-Sachregister (name
index), which maps from name to record location (volume, subvolume,
page number).
If the systematic name isn't known for a compound, first determine
its molecular formula in
Hill order.
Go to the General-Formelregister (formula index) of Beilstein for a
list of compounds with that formula. Es gibt viele Verbindungen mit
... Sorry, got carried away trying to remember enough college German
to read some of the Beilstein examples. There can be many compounds
with the same molecular formula. Even something simple like
C2H6O could be either ethanol or dimethyl ether.
All the molecular formula does is greatly reduce the number of
compounds to consider. It can be reduced even more by using knowledge
of German chemistry nomenclature (perhaps with the help of a handy
German/English chemistry dictionary) to figure out which of those
compound names are most likely to correspond to the structure.
Here's where using a line notation really pays off. There's
generally about 60 lines per page. From
pictures I've seen,
structure formulas even when compressed for space look like they take up
about 5 lines of text. To display the structure in
the index requires at lest quintupling the number of pages used for
the index. Since
a record
itself is only about 5 lines long, it would mean doubling the number of
an already large publication.
If that fails, the compound might still be in Beilstein. Some
compounds aren't listed in the formula index but can found by a
combination of a structure-based decision and leafing through pages.
For an example of the joys of a search, take a look at
this page
where it describes looking for the aluminum salt of
8-hydroxyquinoline (a laser dye).
It isn't listed in the formula index so the way to find it is to use
knowledge of how record entries are laid out. The dye is a
heterocyclic system with one nitrogen, so should be found in volume
21. That contains information about to get to the correct subvolume,
which lists the start page for compounds of the form
CnH2n-11NO and more specifically the start page
for C9H7NO. From that page, manually search
starting from page 1057 until it's found on page 1144. Yowza!
There are other chemistry indicies like CAS, which indexes journal
publications and uses their own nomenclature and search system.
Given the last few essays, you should now be as competant as I at
understanding
summaries
of how they work.
That describes how to use Beilstein, and should provide clues as to
how the database was generated. While I don't know if this is what
they did, this is my best guess. There's a team of chemists trained
in the nomenclature system (it took about three years to become an
adept). They read the raw sources (books, journals, etc.), convert
the information to a structure diagram and apply the nomenclature to
get the systemic name. They then searched their notecards to see if
they knew about it already; updating the cards if they did. If not,
they create new cards; one for the record, one by name, and one by
formula. When it's time to publish they went through the cards and
created the printing plates, which were used to print the book. Very
labor intensive, but that was state of the art. There were only minor
improvements in the process, like improved printing press technology
which made it easier to include depictions, until the 1940s.