One of the problems with curating chemical databases of small organic molecules is filtering out bogus connection tables from legitimate molecules. One aspect of this challenge has been termed [Ed: “I” have termed] EOCWR standing for “Explodes On Contact With Reality”.
An interesting class of broken molecules, that are often overlooked by reasonableness filters, are those that defy the standard model of physics. In this view of matter, atoms are composed of whole numbers of protons, neutrons and electrons. To paraphrase, Democritus “all that exists are groupings of protons, neutrons and electrons in empty space, all else is opinion”. Naturally, under this model the formal charge on an atom cannot exceed its atomic number. Whilst [Ed: “While”] an arbitrary number of electrons may be associated with an atom, it cannot have fewer electrons than zero; hence the positive charge is bounded by the number of protons in the nucleus. However, many cheminformatics file formats record the formal charge rather than the electron count leading to the ability to represent impossible molecules. Checking for these is relatively trivial and allows compounds such as [H+2]
or [C+7]
to be flagged as erroneous.
Another example of testing for EOCWR is the work of Dr Jonathan Goodman and colleagues at the University of Cambridge on the challenges on embedding alkanes in three dimensions (here and here). Their work explains that in some molecules, although all atomic valences are reasonable, steric crowding would produce sufficient strain that the molecule would fall apart. Hence although graph theoretically an sp3 carbon may have four neighbours that each itself has three additional neighbours (all unique) in reality there is an energetic upper bound of 10 second neighbours in alkanes. As above, checking the number of second (and third) neighbours of an atom provides a convenient and efficient way of distinguishing plausible molecules from the artifacts of erroneous molecule processing (termed “robochemistry” by NCBI PubChem’s Evan Bolton).
Image credit: Graham Lees (Tram Painter on Flickr)