请输入您要查询的百科知识:

 

词条 International Chemical Identifier
释义

  1. Overview

  2. Format and layers

  3. Examples

  4. InChIKey

     InChI resolvers 

  5. Name

  6. Continuing development

  7. Adoption

  8. See also

  9. Notes and references

  10. External links

{{Infobox software
| name = InChI
| logo =
| logo caption =
| screenshot =
| caption =
| collapsible =
| author =
| developer = InChI Trust
| released = {{Start date|2005|04|15}}[1][2]
| discontinued =
| latest release version = 1.05
| latest release date = {{Start date and age|2017|03}}
| latest preview version =
| latest preview date =
| programming language =
| operating system = Microsoft Windows and Unix-like
| platform = IA-32 and x86-64
| size = 4.3 MB
| language = English
| language count =
| language footnote =
| status = Active
| genre =
| license = IUPAC / InChI Trust Licence
| alexa =
| website = http://www.iupac.org/home/publications/e-resources/inchi.html
}}

The IUPAC International Chemical Identifier (InChI {{IPAc-en|ˈ|ɪ|n|tʃ|iː}} {{respell|IN|chee}} or {{IPAc-en|ˈ|ɪ|ŋ|k|iː}} {{respell|ING|kee}}) is a textual identifier for chemical substances, designed to provide a standard way to encode molecular information and to facilitate the search for such information in databases and on the web. Initially developed by IUPAC (International Union of Pure and Applied Chemistry) and NIST (National Institute of Standards and Technology) from 2000 to 2005, the format and algorithms are non-proprietary.

The continuing development of the standard has been supported since 2010 by the not-for-profit InChI Trust, of which IUPAC is a member. The current software version is 1.05 and was released in January 2017.

Prior to 1.04, the software was freely available under the open-source LGPL license,[3]

but it now uses a custom license called IUPAC-InChI Trust License.[4]

Overview

The identifiers describe chemical substances in terms of layers of information — the atoms and their bond connectivity, tautomeric information, isotope information, stereochemistry, and electronic charge information.[5]

Not all layers have to be provided; for instance, the tautomer layer can be omitted if that type of information is not relevant to the particular application.

InChIs differ from the widely used CAS registry numbers in three respects: 1) they are freely usable and non-proprietary; 2)they can be computed from structural information and do not have to be assigned by some organization; and 3) most of the information in an InChI is human readable (with practice).

InChIs can thus be seen as akin to a general and extremely formalized version of IUPAC names. They can express more information than the simpler SMILES notation and differ in that every structure has a unique InChI string, which is important in database applications. Information about the 3-dimensional coordinates of atoms is not represented in InChI; for this purpose a format such as PDB can be used.

The InChI algorithm converts input structural information into a unique InChI identifier in a three-step process: normalization (to remove redundant information), canonicalization (to generate a unique number label for each atom), and serialization (to give a string of characters).

The InChIKey, sometimes referred to as a hashed InChI, is a fixed length (27 character) condensed digital representation of the InChI that is not human-understandable. The InChIKey specification was released in September 2007 in order to facilitate web searches for chemical compounds, since these were problematic with the full-length InChI.[6] Unlike the InChI, the InChIKey is not unique: though collisions can be calculated to be very rare, they happen.[7]

In January 2009 the final 1.02 version of the InChI software was released. This provided a means to generate so called standard InChI, which does not allow for user selectable options in dealing with the stereochemistry and tautomeric layers of the InChI string. The standard InChIKey is then the hashed version of the standard InChI string. The standard InChI will simplify comparison of InChI strings and keys generated by different groups, and subsequently accessed via diverse sources such as databases and web resources.

Format and layers

{{Infobox file format
| name = InChI format
| extension =
| mime = chemical/x-inchi
| owner =
| creatorcode =
| genre = chemical file format
| container for =
| contained by =
| extended from =
| extended to =
}}

Every InChI starts with the string "InChI=" followed by the version number, currently 1. This is followed by the letter S for standard InChIs, which is a fully standardized InChI flavor maintaining the same level of attention to structure details and the same conventions for drawing perception. The remaining information is structured as a sequence of layers and sub-layers, with each layer providing one specific type of information. The layers and sub-layers are separated by the delimiter "/" and start with a characteristic prefix letter (except for the chemical formula sub-layer of the main layer). The six layers with important sublayers are:

  1. Main layer
    • Chemical formula (no prefix). This is the only sublayer that must occur in every InChI.
    • Atom connections (prefix: "c"). The atoms in the chemical formula (except for hydrogens) are numbered in sequence; this sublayer describes which atoms are connected by bonds to which other ones.
    • Hydrogen atoms (prefix: "h"). Describes how many hydrogen atoms are connected to each of the other atoms.
  2. Charge layer
    • proton sublayer (prefix: "p" for "protons")
    • charge sublayer (prefix: "q")
  3. Stereochemical layer
    • double bonds and cumulenes (prefix: "b")
    • tetrahedral stereochemistry of atoms and allenes (prefixes: "t", "m")
    • type of stereochemistry information (prefix: "s")
  4. Isotopic layer (prefixes: "i", "h", as well as "b", "t", "m", "s" for isotopic stereochemistry)
  5. Fixed-H layer (prefix: "f"); contains some or all of the above types of layers except atom connections; may end with "o" sublayer; never included in standard InChI
  6. Reconnected layer (prefix: "r"); contains the whole InChI of a structure with reconnected metal atoms; never included in standard InChI

The delimiter-prefix format has the advantage that a user can easily use a wildcard search to find identifiers that match only in certain layers.

Examples

CH3CH2OH
ethanol
InChI=1/C2H6O/c1-2-3/h3H,2H2,1H3

InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3 (standard InChI)


L-ascorbic acid
InChI=1/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11)12-5/h2,5,7-10H,1H2/t2-,5+/m0/s1

InChI=1S/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11)12-5/h2,5,7-10H,1H2/t2-,5+/m0/s1 (standard InChI)

InChIKey

The condensed, 27 character InChIKey is a hashed version of the full InChI (using the SHA-256 algorithm), designed to allow for easy web searches of chemical compounds.[6] The standard InChIKey is the hashed counterpart of standard InChI. Most chemical structures on the Web up to 2007 have been represented as GIF files, which are not searchable for chemical content. The full InChI turned out to be too lengthy for easy searching, and therefore the InChIKey was developed. There is a very small, but nonzero chance of two different molecules having the same InChIKey, but the probability for duplication of only the first 14 characters has been estimated as only one duplication in 75 databases each containing one billion unique structures. With all databases currently having below 50 million structures, such duplication appears unlikely at present. A recent study more extensively studies the collision rate finding that the experimental collision rate is in agreement with the theoretical expectations.[8]

InChIKeys consist of 14 characters resulting from a hash of the connectivity information of the InChI, followed by a hyphen, followed by 8 characters resulting from a hash of the remaining layers of the InChI, followed by a single character indicating the kind of InChIKey, followed by a single character indicating the version of InChI used, another hyphen, followed by single character indicating protonation.[9]

Example: Morphine has the structure shown on the right. The standard InChI for morphine is InChI=1S/C17H19NO3/c1-18-7-6-17-10-3-5-13(20)16(17)21-15-12(19)4-2-9(14(15)17)8-11(10)18/h2-5,10-11,13,16,19-20H,6-8H2,1H3/t10-,11+,13-,16-,17-/m0/s1

and the standard InChIKey for morphine is BQJCRHHNABKAKU-KBQPJGBKSA-N.[10]

InChI resolvers

As the InChI cannot be reconstructed from the InChIKey, an InChIKey always needs to be linked to the original InChI to get back to the original structure. InChI Resolvers act as a lookup service to make these links, and prototype services are available from National Cancer Institute, the [https://www.ebi.ac.uk/unichem/ UniChem service] at the European Bioinformatics Institute, and PubChem. ChemSpider has had a resolver until July 2015 when it was decommissioned.[11]

Name

The format was originally called IChI (IUPAC Chemical Identifier), then renamed in July 2004 to INChI (IUPAC-NIST Chemical Identifier), and renamed again in November 2004 to InChI (IUPAC International Chemical Identifier), a trademark of IUPAC.

Continuing development

Scientific direction of the InChI standard is carried out by the IUPAC Division VIII Subcommittee, and funding of subgroups investigating and defining the expansion of the standard is carried out by both IUPAC and the InChI Trust. The InChI Trust funds the development, testing and documentation of the InChI. Current extensions are being defined to handle polymers and mixtures, Markush structures, reactions[12] and organometallics, and once accepted by the Division VIII Subcommittee will be added to the algorithm.

Adoption

The InChI has been adopted by many larger and smaller databases, including ChemSpider, ChEMBL, Golm Metabolome Database, OpenPHACTS, and PubChem.[13] However, the adoption is not straightforward, and many databases show a discrepancy between the chemical structures and the InChI they contain, which is a problem for linking databases.[14]

See also

  • Molecular Query Language
  • Simplified molecular-input line-entry system (SMILES)
  • Molecule editor
  • SYBYL Line Notation
  • Bioclipse generates InChI and InChIKeys for drawn structures or opened files
  • the Chemistry Development Kit uses JNI-InChI to generate InChIs, can convert InChIs into structures, and generate tautomers based on the InChI algorithms

Notes and references

1. ^{{cite web|title=IUPAC International Chemical Identifier Project Page|url=http://www.iupac.org/home/projects/project-db/project-details.html?tx_wfqbe_pi1%5bproject_nr%5d=2000-025-1-800|work=IUPAC|accessdate=5 December 2012|deadurl=yes|archiveurl=https://web.archive.org/web/20120527162256/http://www.iupac.org/home/projects/project-db/project-details.html?tx_wfqbe_pi1%5Bproject_nr%5D=2000-025-1-800|archivedate=27 May 2012|df=}}
2. ^{{Cite journal | last1 = Heller | first1 = S. | last2 = McNaught | first2 = A. | last3 = Stein | first3 = S. | last4 = Tchekhovskoi | first4 = D. | last5 = Pletnev | first5 = I. | title = InChI - the worldwide chemical structure identifier standard | doi = 10.1186/1758-2946-5-7 | journal = Journal of Cheminformatics | volume = 5 | issue = 1 | pages = 7 | year = 2013 | pmid = 23343401| pmc =3599061 }}
3. ^{{cite news | last = McNaught | first = Alan | title = The IUPAC International Chemical Identifier:InChl | work = Chemistry International | volume = 28 | issue = 6 | publisher = IUPAC | year = 2006 | url = http://www.iupac.org/publications/ci/2006/2806/4_tools.html | accessdate = 2007-09-18}}
4. ^http://www.inchi-trust.org/download/104/LICENCE.pdf
5. ^{{cite journal |first1=S.R.|last1=Heller|first2=A.|last2=McNaught|first3=I.|last3=Pletnev|first4=S.|last4=Stein|first5=D.|last5=Tchekhovskoi|title=InChI, the IUPAC International Chemical Identifier|journal=Journal of Cheminformatics|volume=7|year=2015|doi=10.1186/s13321-015-0068-4|pmc=4486400}}
6. ^{{cite web|title=The IUPAC International Chemical Identifier (InChI) |publisher=IUPAC |date=5 September 2007 |url=http://www.iupac.org/inchi/release102.html |accessdate=2007-09-18 |deadurl=yes |archiveurl=https://web.archive.org/web/20071030202540/http://www.iupac.org/inchi/release102.html |archivedate=October 30, 2007 }}
7. ^{{cite web | title=InChIKey collision: the DIY copy/pastables | author=E.L. Willighagen | date = 17 September 2011 | url = http://chem-bla-ics.blogspot.nl/2011/09/inchikey-collision-diy-copypastables.html | accessdate = 2012-11-06}}
8. ^{{Cite journal | last1 = Pletnev | first1 = I. | last2 = Erin | first2 = A. | last3 = McNaught | first3 = A. | last4 = Blinov | first4 = K. | last5 = Tchekhovskoi | first5 = D. | last6 = Heller | first6 = S. | doi = 10.1186/1758-2946-4-39 | title = InChIKey collision resistance: An experimental testing | journal = Journal of Cheminformatics | volume = 4 | issue = 1 | pages = 39 | year = 2012 | pmid = 23256896| pmc = 3558395}}
9. ^{{cite web|url=http://www.inchi-trust.org/technical-faq/#13.1|title=Technical FAQ - InChI Trust|author=|date=|website=inchi-trust.org|accessdate=14 April 2018}}
10. ^{{cite web | title = InChI=1/C17H19NO3/c1-18... | publisher = Chemspider | url = http://www.chemspider.com/RecordView.aspx?id=5760 | accessdate = 2007-09-18}}
11. ^InChI Resolver, 27 July 2015, http://www.chemspider.com/InChiResolverDecommissioned.aspx
12. ^{{cite journal|last1=Grethe|first1=Guenter|last2=Blanke|first2=Gerd|last3=Kraut|first3=Hans|last4=Goodman|first4=Jonathan M.|title=International chemical identifier for reactions (RInChI)|journal=Journal of Cheminformatics|date=9 May 2018|volume=10|issue=1|doi=10.1186/s13321-018-0277-8}}
13. ^{{Cite journal | last1 = Warr | first1 = W.A. | last2 = | first2 = | year = 2015 | title = Many InChIs and quite some feat | journal = Journal of Computer-Aided Molecular Design | volume = | issue = | pages = | publisher = | jstor = | doi = 10.1007/s10822-015-9854-3 | url = | format = | accessdate = | bibcode =2015JCAMD..29..681W}}
14. ^{{Cite journal | last1 = Akhondi | first1 = S. A. | last2 = Kors | first2 = J. A. | last3 = Muresan | first3 = S. | doi = 10.1186/1758-2946-4-35 | title = Consistency of systematic chemical identifiers within and between small-molecule databases | journal = Journal of Cheminformatics | volume = 4 | issue = 1 | pages = 35 | year = 2012 | pmid = 23237381| pmc =3539895 }}

External links

{{Wikidata property|P234}}
  • IUPAC InChI site
  • Description of the canonicalization algorithm
  • Googling for InChIs a presentation to the W3C.
  • [https://web.archive.org/web/20100330213717/http://www.iupac.org/inchi/release102final.html InChI Release 1.02] InChI final version 1.02 and explanation of Standard InChI, January 2009
  • [https://cactus.nci.nih.gov/chemical/structure NCI/CADD Chemical Identifier Resolver] Generates and resolves InChI/InChIKeys and many other chemical identifiers
  • [https://pubchem.ncbi.nlm.nih.gov/edit/index.html PubChem online molecule editor] that supports SMILES/SMARTS and InChI
  • ChemSpider Services that allows generation of InChI and conversion of InChI to structure (also SMILES and generation of other properties)
  • [https://web.archive.org/web/20070404073952/http://www.chemaxon.com/demosite/marvin/index.html MarvinSketch] from ChemAxon, implementation to draw structures (or open other file formats) and output to InChI file format
  • BKchem implements its own InChI parser and uses the IUPAC implementation to generate InChI strings
  • CompoundSearch implements an InChI and InChI Key search of spectral libraries
  • JSME is a free JavaScript based molecular editor that generates InChI and InChI Key in a web browser, which allows for easy web searches of chemical compounds

5 : Chemical nomenclature|Encodings|Chemical file formats|Identifiers|Open formats

随便看

 

开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。

 

Copyright © 2023 OENC.NET All Rights Reserved
京ICP备2021023879号 更新时间:2024/11/13 12:10:38