Encyclopedia > Simplified molecular input line entry specification
The simplified molecular input line entry specification or SMILES is a specification for unambiguously describing the structure of chemical molecules using short ASCII strings. SMILES strings can be imported by most molecule editors for conversion back into two-dimensional drawings or three-dimensional models of the molecules. A filename extension is a suffix to the name of a computer file applied to show its format. ...
This article discusses some common molecular file formats, including usage and converting between them. ...
Chemistry - the study of atoms, made of nuclei (center particles) and electrons (outer particles), and the structures they form. ...
In chemistry, a molecule is an aggregate of two or more atoms in a definite arrangement held together by chemical bonds [1] [2] [3] [4] [5]. Chemical substances are not infinitely divisible into smaller fractions of the same substance: a molecule is generally considered the smallest particle of a pure...
There are 95 printable ASCII characters, numbered 32 to 126. ...
In computer programming and some branches of mathematics, strings are sequences of various simple objects. ...
A molecule editor is a computer program for drawing and editing chemical structures. ...
Dimension (from Latin measured out) is, in essence, the number of degrees of freedom available for movement in a space. ...
:For other senses of this word, see dimension (disambiguation). ...
The original SMILES specification was developed by Arthur Weininger and David Weininger in the late 1980s. It has since been modified and extended by others, most notably by Daylight Chemical Information Systems Inc. Other 'linear' notations include the Wiswesser Line Notation (WLN), ROSDAL and SLN (Tripos Inc). Recently, the IUPAC has introduced the InChI as a standard for formula representation. SMILES is generally considered to have the advantage of being slightly more human-readable than InChI; it also has a wide base of software support with extensive theoretical (e.g., graph theory) backing. The 1980s refers to the years of 1980 to 1989. ...
Wisswesser Line Notation, invented by William J. Wisswesser in the 1949[1], was the first line notation capable of precisely describing complex molecules. ...
The SYBYL line notation or SLN is a specification for unambiguously describing the structure of chemical molecules using short ASCII strings. ...
IUPAC logo The International Union of Pure and Applied Chemistry (IUPAC) (Pronounced as eye-you-pack) is an international non-governmental organization established in 1919 devoted to the advancement of chemistry. ...
The IUPAC International Chemical Identifier (InChI), developed by IUPAC and NIST, is a digital equivalent of the IUPAC name for any particular covalent compound. ...
A pictorial representation of a graph In mathematics and computer science, graph theory is the study of graphs, mathematical structures used to model pairwise relations between objects from a certain collection. ...
Canonical SMILES and Isomeric SMILES
The term Canonical SMILES refers to the version of the SMILES specification that includes rules for ensuring that each distinct chemical molecule has a single unique SMILES representation. A common application of Canonical SMILES is for indexing and ensuring uniqueness of molecules in a database. To meet Wikipedias content policies, the external links section for this article may require cleanup. ...
The term Isomeric SMILES refers to the version of the SMILES specification that includes extensions to support the specification of isotopes, chirality, and configuration about double bonds. A notable feature of these rules is that they allow rigorous partial specification of chirality. Isotopes are any of the several different forms of an element each having different atomic mass (mass number). ...
The term chiral (pronounced ) is used to describe an object which is non-superimposable on its mirror image. ...
Graph-based definition In terms of a graph-based computational procedure, SMILES is a string obtained by printing the symbol nodes encountered in a depth-first tree traversal of a chemical graph. The chemical graph is first trimmed to remove hydrogen atoms and cycles are broken to turn it into a spanning tree. Where cycles have been broken, numeric suffix labels are included to indicate the connected nodes. Parentheses are used to indicate points of branching on the tree. Depth-first search (DFS) is an algorithm for traversing or searching a tree, tree structure, or graph. ...
In computer science, tree traversal refers to the process of visiting each node in a tree data structure, exactly once, in a systematic way. ...
A spanning tree (red) of a graph (black), superimposed In the mathematical field of graph theory, a spanning tree T of a connected, undirected graph G is a tree composed of all the vertices and some (or perhaps all) of the edges of G. Informally, a spanning tree of G...
Examples Atoms are represented by the standard abbreviation of the chemical elements, in square brackets, such as [Au] for gold. The hydroxide anion is [OH-]. Brackets can be omitted for the "organic subset" of B, C, N, O, P, S, F, Cl, Br, and I. All other elements must be enclosed in brackets. If the brackets are omitted, the proper number of implicit hydrogen atoms is assumed; for instance the SMILES for water is simply O and that for ethanol is CCO. âAtomicâ redirects here. ...
The periodic table of the chemical elements A chemical element, or element for short, is a type of atom that is defined by its atomic number; that is, by the number of protons in its nucleus. ...
General Name, Symbol, Number gold, Au, 79 Chemical series transition metals Group, Period, Block 11, 6, d Appearance metallic yellow Atomic mass 196. ...
Hydroxide is a polyatomic ion consisting of oxygen and hydrogen: OHâ It has a charge of â1. ...
An anion is an ion with negative charge. ...
Impact of a drop of water. ...
Ethanol, also known as ethyl alcohol, drinking alcohol or grain alcohol, is a flammable, colorless, slightly toxic chemical compound with a distinctive perfume-like odor, and is the alcohol found in alcoholic beverages. ...
The double-bonded carbon dioxide is represented as O=C=O and the triple-bonded hydrogen cyanide as C#N. A chemical bond is the physical process responsible for the attractive interactions between atoms and molecules, and that which confers stability to diatomic and polyatomic chemical compounds. ...
Carbon dioxide is a chemical compound composed of one carbon and two oxygen atoms. ...
Hydrogen cyanide is a chemical compound with chemical formula HCN. A solution of hydrogen cyanide in water is called hydrocyanic acid or Prussic acid. ...
Branches are described with parentheses, as in CCC(=O)O for propionic acid and C(F)(F)F for fluoroform, which could also be described by the non-canonical formula FC(F)F. Propionic acid (systematically named propanoic acid) is a naturally occurring carboxylic acid with chemical formula CH3CH2COOH. In the pure state, it is a colorless, corrosive liquid with a pungent odor. ...
Fluoroform CHF3; CAS number, also known as trifluoromethane, is one of the haloalkanes with zero ozone depletion, as it does not contain any chlorine. ...
Cyclohexane is represented as C1CCCCC1, the idea being that the two 'number ones' label the same position in the molecule, thus forming a ring with six carbons. Note that the label is the numeral (in this case the 1) rather than the combination of 'C1'. Cyclohexane is a cycloalkane with the molecular formula C6H12. ...
Aromatic C, O, S and N atoms are shown in their lower case 'c', 'o', 's' and 'n' respectively. Bonds in an aromatic cycle are rarely marked explicitly except in SMARTS search patterns. Thus Benzene is c1ccccc1. Aromaticity is a chemical property in which a conjugated ring of unsaturated bonds, lone pairs, or empty orbitals exhibit a stabilization stronger than would be expected by the stabilization of conjugation alone. ...
Benzene, also known as benzol, is an organic chemical compound with the formula C6H6. ...
Isomeric SMILES
Representation of cis-difluoroethene Configuration around double bonds is specified using the characters "/" and "". For example, F/C=C/F is one representation of trans-difluoroethene, in which the Fs are on opposite sides of the double bond, whereas F/C=CF is one possible representation of cis-difluoroethene, in which the Fs are on the same side of the double bond, as shown in the figure. Image File history File links No higher resolution available. ...
Image File history File links No higher resolution available. ...
Trans is a Latin word meaning across, beyond or on the opposite side and is the opposite of cis, which means on the same side. In chemistry, a bond not subject to free rotation in which the greater radical on both ends is on the opposite side of the bond...
CIS usually refers to: Commonwealth of Independent States, a modern-day political entity consisting of 11 former Soviet Union Republics CIS is also an acronym for: Canadian Interuniversity Sport Cancer Information Service Carcinoma in situ Centre for Independent Studies Center for Immigration Studies Chinese International School Cisalpino Citizenship & Immigration Services...
Extensions SMARTS is a modification of SMILES that allows, in addition to the SMILES elements, the specification of wildcard atoms and bonds. This is used in specifying search structures and is widely used in chemical database search applications. This practice has led to a common misconception that chemical substructure search is achieved computationally by matching SMILES/SMARTS strings, when, in fact, it is achieved by the computationally more intensive search for subgraph isomorphism in the graphs reconstructed from the SMILES representations. The term wildcard character has the following meanings: // Telecommunication In telecommunications, a wildcard character is a character that may be substituted for any of a defined subset of all possible characters. ...
To meet Wikipedias content policies, the external links section for this article may require cleanup. ...
One major problem that has plagued graph theory since its inception is the consistent lack of consistency in terminology. ...
In mathematics, an isomorphism (in Greek isos = equal and morphe = shape) is a kind of mapping between objects, devised by Eilhard Mitscherlich, which shows a relation between two properties or operations. ...
Conversion SMILES can be converted back to 2-dimensional representations using Structure Diagram Generation algorithms (Helson, 1999). This conversion is not always unambiguous. Conversion to 3-dimensional representation is achieved by energy minimization approaches.
See also The SYBYL line notation or SLN is a specification for unambiguously describing the structure of chemical molecules using short ASCII strings. ...
The Molecular Query Language (MQL) was designed for allowing more complex and problem specific search methods. ...
Query languages are computer languages used to make queries into databases and information systems. ...
The Chemistry Development Kit is an open source Java library for Chemoinformatics and Bioinformatics. ...
The IUPAC International Chemical Identifier (InChI), developed by IUPAC and NIST, is a digital equivalent of the IUPAC name for any particular covalent compound. ...
IUPAC logo The International Union of Pure and Applied Chemistry (IUPAC) (Pronounced as eye-you-pack) is an international non-governmental organization established in 1919 devoted to the advancement of chemistry. ...
OpenBabel is free software, a chemical expert system mainly used for converting chemical file formats. ...
JOELib is a freeware chemical expert system mainly used for converting chemical file formats. ...
OELib was an Open Source Cheminformatics library. ...
References - Anderson, E., G.D. Veith, and D. Weininger. 1987. SMILES: A line notation and computerized interpreter for chemical structures. Report No. EPA/600/M-87/021. U.S. EPA, Environmental Research Laboratory-Duluth, Duluth, MN 55804
- Weininger, D. (1988), 'SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules', J. Chem. Inf. Comput. Sci. 28, 31 - 36.
- Helson, HE (1999). Structure Diagram Generation In Rev. Comput. Chem. edited by Lipkowitz, K. B. and Boyd, D. B. Wiley-VCH, New York, pages 313-398.
External links |