请输入您要查询的百科知识:

 

词条 CATH database
释义

  1. Hierarchical organization

  2. Releases

  3. Open source software

  4. References

{{Infobox biodatabase
|title = CATH
|logo =
|description =Protein Structure Classification
|scope =
|organism =
|center =University College London
|laboratory = Institute of Structural and Molecular Biology
|author =
|citation = Dawson et al. (2016) [1]
|released = 1997
|standard =
|format =
|url = {{URL|cathdb.info}}
|download = {{URL|cathdb.info/download}}
|webservice =
|sql =
|sparql =
|webapp =
|standalone =
|license =
|versioning =
|frequency = CATH-B is released daily. Official releases are approximately annual.
|curation =
|bookmark =
|version = 4.1
}}

The CATH Protein Structure Classification database is a free, publicly available online resource that provides information on the evolutionary relationships of protein domains. It was created in the mid-1990s by Professor Christine Orengo and colleagues including Janet Thornton and David Jones,[2] and continues to be developed by the Orengo group at University College London. CATH shares many broad features with the SCOP resource, however there are also many areas in which the detailed classification differs greatly.[3][4][5][6]

Hierarchical organization

Experimentally-determined protein three-dimensional structures are obtained from the Protein Data Bank and split into their consecutive polypeptide chains, where applicable. Protein domains are identified within these chains using a mixture of automatic methods and manual curation.

The domains are then classified within the CATH structural hierarchy: at the Class (C) level, domains are assigned according to their secondary structure content, i.e. all alpha, all beta, a mixture of alpha and beta, or little secondary structure; at the Architecture (A) level, information on the secondary structure arrangement in three-dimensional space is used for assignment; at the Topology/fold (T) level, information on how the secondary structure elements are connected and arranged is used; assignments are made to the Homologous superfamily (H) level if there is good evidence that the domains are related by evolution [2] i.e. they are homologous.

The four main levels of the CATH hierarchy:
#LevelDescription
1 Class the overall secondary-structure content of the domain. (Equivalent to the SCOP Class)
2 Architecture high structural similarity but no evidence of homology. (Equivalent to the 'fold' level in SCOP)
3 Topology/fold a large-scale grouping of topologies which share particular structural features
4 Homologous superfamily indicative of a demonstrable evolutionary relationship. (Equivalent to SCOP superfamily)

Additional sequence data for domains with no experimentally determined structures are provided by CATH's sister resource, Gene3D, which are used to populate the homologous superfamilies. Protein sequences from UniProtKB and Ensembl are scanned against CATH HMMs to predict domain sequence boundaries and make homologous superfamily assignments.

Releases

The CATH team aim to provide official releases of the CATH classification every 12 months. This release process is important because is allows for the provision of internal validation, extra annotations and analysis. However, it can mean that there is a time delay between new structures appearing in the PDB and the latest official CATH release,

In order to address this issue: CATH-B provides a limited amount of information to the very latest domain annotations (e.g. domain boundaries and superfamily classifications).

The latest release of CATH-Gene3D (v4.1) was released in July 2016 and consists of:

  • 308,999 structural protein domain entries [1]
  • 53,479,436 non-structural protein domain entries [1]
  • 2,737 homologous superfamily entries [1]
  • 92,882 functional family entries [1]

Open source software

CATH is an open source software project, with developers developing and maintaining a number of open source tools.[7] CATH maintains a todo list on GitHub to allow external users to create and keep track of issues relating to the CATH protein structure classification.

References

1. ^{{cite journal|last1=Dawson|first1=NL|last2=Lewis|first2=TE|last3=Das|first3=S|last4=Lees|first4=JG|last5=Lee|first5=D|last6=Ashford|first6=P|last7=Orengo|first7=CA|last8=Sillitoe|first8=I|title=CATH: an expanded resource to predict protein function through structure and sequence.|journal=Nucleic Acids Research|date=28 November 2016|pmid=27899584|doi=10.1093/nar/gkw1098|pmc=5210570|volume=45|pages=D289-D295}}
2. ^{{cite journal|last1=Orengo|first1=CA|last2=Michie|first2=AD|last3=Jones|first3=S|last4=Jones|first4=DT|authorlink4=David Tudor Jones|last5=Swindells|first5=MB|last6=Thornton|first6=JM|authorlink6=Janet Thornton|title=CATH – a hierarchic classification of protein domain structures|journal=Structure|volume=5|issue=8|year=1997|pages=1093–1109|issn=0969-2126|doi=10.1016/S0969-2126(97)00260-8|pmid=9309224}}
3. ^{{cite web|url=http://www.cathdb.info |title=CATH: Protein Structure Classification Database at UCL |website=Cathdb.info |date= |accessdate=2017-03-09}}
4. ^{{cite web|url=http://www.cathdb.info/wiki/doku/?id=tutorials:index |title=CATH |website=Cathdb.info |date= |accessdate=2017-03-09}}
5. ^{{cite web|url=https://twitter.com/CATHDatabase |title=CATH Database (@CATHDatabase) |publisher=Twitter |date= |accessdate=2017-03-09}}
6. ^{{cite journal|last1=Pearl|first1=F. M. G.|title=The CATH database: an extended protein family resource for structural and functional genomics|journal=Nucleic Acids Research|volume=31|issue=1|year=2003|pages=452–455|issn=1362-4962|doi=10.1093/nar/gkg062}}
7. ^{{cite web|url=http://www.cathdb.info/wiki/doku/?id=cath_tools|title=Tools|last=|first=|date=|website=cathdb.info|publisher=|access-date=2016-12-18}}
{{Use dmy dates|date=April 2017}}

6 : Protein structure|Protein folds|Classification systems|Biological databases|Protein classification|Protein superfamilies

随便看

 

开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。

 

Copyright © 2023 OENC.NET All Rights Reserved
京ICP备2021023879号 更新时间:2024/9/20 20:02:06