|
The IUPAC International Chemical Identifier (InChI), developed by IUPAC and NIST, is a digital equivalent of the IUPAC name for any particular covalent compound. Chemical structures are expressed in terms of five layers of information — connectivity, tautomeric, isotopic, stereochemical, and electronic. IUPAC logo The International Union of Pure and Applied Chemistry (IUPAC) (Pronounced as eye-you-pack) is an international non-governmental organization established in 1919 devoted to the advancement of chemistry. ...
NIST logo The National Institute of Standards and Technology (NIST, formerly known as The National Bureau of Standards) is a non-regulatory agency of the United States Department of Commerceâs Technology Administration. ...
IUPAC nomenclature is a system of naming chemical compounds and of describing the science of chemistry in general. ...
Tautomers are organic compounds that are interconvertible by a chemical reaction called tautomerization. ...
Isotopes are any of the several different forms of an element each having different atomic mass (mass number). ...
The different types of isomers. ...
The InChI algorithm converts input structural information into the InChI identifier in a three-step process: normalization (to remove redundant information), canonicalization (to generate a unique set of atom labels), and serialization (to give a string of characters). Examples CH3CH2OH ethanol | InChI=1/C2H6O/c1-2-3/h3H,2H2,1H3 |
 L-ascorbic acid | InChI=1/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11)12-5/h2,5,7-10H,1H2/t2-,5+/m0/s1 | Grain alcohol redirects here. ...
Image File history File links No higher resolution available. ...
This article deals with the molecular aspects of ascorbic acid. ...
Layer types There are six InChI layer types: - Main layer
- Charge layer
- Stereochemical layer
- Isotopic layer
- Fixed-H layer
- Reconnected Layer
Electric charge is a fundamental property of some subatomic particles, which determines their electromagnetic interactions. ...
The different types of isomers. ...
Isotopes are any of the several different forms of an element each having different atomic mass (mass number). ...
Sub-layers Each layer can be split into sub-layers. For example, the main layer can be split up into three sub-layers: - Chemical formula (no prefix)
- Atom connections (prefix: "c")
- Hydrogen atoms (prefix: "h")
This article or section does not cite any references or sources. ...
General Name, Symbol, Number hydrogen, H, 1 Chemical series nonmetals Group, Period, Block 1, 1, s Appearance colorless Atomic mass 1. ...
Notation Layers and sub-layers are both separated by the "/" delimiter. All layers and sub-layers (except for the chemical formula sub-layer of the main layer) start with a lower-case letter indicating the type of information held in that layer.
Limitations The only documentation for how to generate InChI strings is the InChI implementation, available from the SourceForge site. The details of each sub-layer format has not been documented sufficiently enough for others to implement independent parsers. The closest is BKChem which claims a 98.5% success rate reading the InChI strings for the NCI data set. Although those format details are mostly reverse-engineerable, they are not enough that other software can generate InChI-like strings and have the InChI software do the canonicalization. The IUPAC InChI software is designed to convert MDL molfiles into InChI strings; parsing InChI strings is of lesser importance. For example, some deliberately constructed "InChI"-like strings, when passed through the InChI algorithm, produce incorrect (non-canonical) results. (Details reported on the InChI discussion list hosted at Sourceforge, in several postings around early July 2007.) Because there is no such thing as a "non-canonical" but InChI-like string, it is not possible - except in the most trivial of molecules - to generate or validate an InChI string manually. The InChI software was not designed to be robust against hostile use. Several denial-of-service and buffer overflow attacks exist in the InChI parser of the code base (for examples InChI=1/65536C65536 and InChI=1/C/q2987987*-1). Because there is only one widely used implementation, essentially all software accepting InChI strings from untrusted sources are potentially open to attack. At present this exploit is theoretical. The implementation can be used as a library but it was not designed that way. (For example, functions names have no special prefix so it's more likely to get namespace collisions). The public release only supports the Microsoft and gcc compilers. IUPAC continues to develop and refine the code but those releases take a long time, so third-paty patches to fix security holes and improve portability are not quickly folded back in to the public code base.
See also This article does not cite any references or sources. ...
A molecule editor is a computer program for drawing and editing chemical structures. ...
External links |