This post originated from an RSS feed registered with Python Buzz
by Andrew Dalke.
Original Post: Systematic Name - History of Chemical Nomenclature
Feed Title: Andrew Dalke's writings
Feed URL: http://www.dalkescientific.com/writings/diary/diary-rss.xml
Feed Description: Writings from the software side of bioinformatics and chemical informatics, with a heaping of Python thrown in for good measure.
Note: I am not a chemist nor have I researched this throughly. Don't
trust your term paper on what I've written. Please
let me know
of any mistakes I've made.
The
molecular formula
is great for the task of listing proportions. It isn't enough.
Chemists build on the work of others. For that to work, a chemist
needs to describe a compound and other chemists need to know that that
description exists. The old solution was to come up with a new,
unique name for a compound, based often on its origin. Everyone
simply memorizes that list of names. But while chemists have
exceptional memory for chemistry terms, it's impossible to memorize
millions of names if there is no underlying meaning in the name or
relationship between names.
Chemists of the 19th century came up with two ways
to describe a compound which did work. The prefered way, still
paramount, is through the
chemical diagram. It appears to have been
started by Archibald Scott Couper
in 1858 (need to do
more research).
This is the familiar two-dimensional (also called topological)
depiction of the molecule. It is very simple to understand after some
training, and it builds on the ability of the human mind to interpret
images. It's especially powerful when comparing multiple compounds
which are part of a series and where the core structure is aligned.
(It really is built on the human ability to remember pictures, and
less on the ability to interpret graph topologies. The easiest way to
test that is to take a depiction a chemist usually uses and flip it
upside down. It will likely take a little bit for that chemist to
recognize it. That's why the
standard nomenclature for steroids
specifically shows the prefered orientation in depictions as
and states that "[p]rojections of steroid formulae should not be
oriented as in formulae 2c, 2d or 2e unless circumstances make it
obligatory, e.g.in dimers formed photochemically."
Is it immediately to you that all those structures are the same?
If so, are you also a chemist?)
Depictions are a very good way for chemists to describe a compound but
they suffer from several notable problems. First, they are big.
Consider what a chemistry paper would look like if it had to use
Rinse with
instead of
Rinse with ethanol.
That's a tiny molecule. Imagine using one of the steroid images
instead.
This problem could be, and is, remedied in part by depicting the graph
once in a paper, assigning it a name, and refering to the graph
through its name. For common compounds, like ethanol, there's no need
to show the graph. There's still the problem of coming up with a
good name.
Another problem is that it's hard to say a graph. Try giving a talk
with circumlocutions like "okay, there's a six-element ring fused to
another six element ring fused to another six element ring (two bonds
from the first fuse) which is itself fused to a five element ring
(four bonds from the second fuse)." Again, it needs a name.
The other big problem, especially in the pre-computer days, was the
ability to search for a graph. How are graphs sorted into a list? If
there's an index, is there a (large) depiction for the index as well
as the record itself? A picture is said to be worth a thousand words,
but surely there must be a shorter way to describe a picture as a
shorter word, which can fit on a line of text; a line notation.
Chemists aren't stupid. They quickly noticed that carbon played an
important role in all sorts of interesting compounds, which became the
field of organic chemistry. (Inorganic chemistry has its own way of
doing things in part because of the increased role of describing the
crystal structure and the decreased complexity of the compounds.
Carbons, unlike most every other element, easily forms long chains.)
Various sorts of nomenclature systems were used to turn the graph into
a word. International codification occured with the Geneva Congress
of 1892, which evolved into what is now called the IUPAC nomenclature.
It general approach starts by finding the parent. In the
simplest case it's a matter of identifing the longest chain of
carbons, figuring out which end is the start end, then naming the bits
and pieces which come off each side. This is recursive because those
bits and pieces may have more bits and pieces.
Life (which is made of chemistry) isn't so simple. What about rings?
What if there are several chains of equal length? These can all be
figured out, and there is a way to do which generates a unique word.
The spanner in the works is that chemists don't want just a name for
the compound; they want the name to indicate functionality. (This
makes it easier to figure out what something does, and give the
ability to catalog, say, all compounds with the same function
together.) Substructures, like an ketone or alcohol or aldhyde, are
strong indicators of functionality so are used to determine the
parent.
There end up being many complications in making chemistry nomenclature
fit the chemist's model of chemistry. For example, a compound can
have have multiple functions. Nicotinoyl morpholine and
pyridyl morpholinyl ketone are both names for the same compound
(according to Garfield's thesis; remember, I'm not a chemist). In the
first case, the morpholine is regarded as the parent structure, while
the ketone is the parent for the second case. Garfield says it took
three years to train a chemist in how to use the system, and the result
is a systematic name which may be quite different than the name
the chemist uses.
The term "systematic name" is quite interesting. It's meant as the
word created when turning the graph into a unique name following the
rules of a nomenclature system, but it's different than the term
canonical name used for line notations like SMILES. I don't
know why.
Garfield points out that while all compounds may have a systematic
name, getting chemists to always use the systematic name is
impossible. Some compounds are known by a trade name, like
formaldehyde, and others because the compound was identified long
before the structure was determined, like insulin. These are called
trivial names, but one chemist's trivial name is another
chemist's systematic name. A steroid chemist will use the term
androstane and not cyclopentanophenanthrene, because the
first is part of the systematic name for steroids, and because the
second is just too long.
The nomenclature system then is very much like a dictionary. Some
nations (France) have an official language (French) with an institute
(L'Académie française) which proscribe all the words in
the general language ("email", non; "courriel", oui). Other
dictionaries (OED) describe how
words are used but are not meant to enforce use and have liberal rules
for accepting new words. In any case, you might think you're a real
hep cat making up some fly words but it's pretty pants if no one gets
your groove.
The major difference is that chemical nomenclature can be used to
describe any compound (up to limits of the chemical model used;
organic chemistry nomenclature can't be used to describe iron ore) and
is not limited to a finite set of words.
I really like Garfield's use of linguistics to recast the
trivial/systematic dichotomy. He uses the word idiom. A term
is an idiom if it can't be understood from it morphemes. I take that
to mean that all morphemes (in chemistry these include eth,
an, and ol) are idioms as is insulin while
ethanol is not (being composed purely of morphemes). Seriously.
Read
his paper.
It's quite comprehensible even to a non-chemist, and it gives a
good history of the problem and (in retrospect) a perspective of the
state of computers in the 1950s.