请输入您要查询的百科知识:

 

词条 Sequence database
释义

  1. Search

  2. Current issues

  3. See also

  4. References

  5. External links

{{see also|Protein structure database}}

In the field of bioinformatics, a sequence database is a type of biological database that is composed of a large collection of computerized ("digital") nucleic acid sequences, protein sequences, or other polymer sequences stored on a computer. The UniProt database is an example of a protein sequence database. As of 2013 it contained over 40 million sequences and is growing at an exponential rate.[1] Historically, sequences were published in paper form, but as the number of sequences grew, this storage method became unsustainable.

Search

Sequence databases can be searched using a variety of methods. The most common usage is probably searching for sequences similar to a certain target protein or gene whose sequence is already known to the user. The BLAST program is a popular method of this type.

Current issues

Records in sequence databases are deposited from a wide range of sources, from individual researchers to large genome sequencing centers. As a result, the sequences themselves, and especially the biological annotations attached to these sequences, may vary in quality. There is much redundancy, as multiple labs may submit numerous sequences that are identical, or nearly identical, to others in the databases.[2]

Many annotations of the sequences are based not on laboratory experiments, but on the results of sequence similarity searches for previously-annotated sequences. Once a sequence has been annotated based on similarity to others, and itself deposited in the database, it can also become the basis for future annotations. This can lead to a transitive annotation problem because there may be several such annotation transfers by sequence similarity between a particular database record and actual wet lab experimental information.[3] Therefore, care must be taken when interpreting the annotation data from sequence databases.

See also

  • FASTA format
  • SIMAP

References

1. ^{{cite journal|last1=Cochrane|first1=G.|last2=Karsch-Mizrachi|first2=I.|last3=Nakamura|first3=Y.|title=The International Nucleotide Sequence Database Collaboration|journal=Nucleic Acids Research|date=23 November 2010|volume=39|issue=Database|pages=D15–D18|doi=10.1093/nar/gkq1150}}
2. ^{{Cite journal | last1 = Sikic | first1 = K. | last2 = Carugo | first2 = O. | title = Protein sequence redundancy reduction: comparison of various method | journal = Bioinformation | volume = 5 | issue = 6 | pages = 234–9 | year = 2010 | doi = 10.6026/97320630005234| pmid = 21364823 | pmc=3055704}}
3. ^{{Cite journal | last1 = Iliopoulos | first1 = I. | last2 = Tsoka | first2 = S. | last3 = Andrade | first3 = MA. | last4 = Enright | first4 = AJ. | last5 = Carroll | first5 = M. | last6 = Poullet | first6 = P. | last7 = Promponas | first7 = V. | last8 = Liakopoulos | first8 = T. | last9 = Palaios | first9 = G. | last10 = Pasquier | first10 = C | last11 = Hamodrakas | first11 = S | last12 = Tamames | first12 = J | last13 = Yagnik | first13 = A. T. | last14 = Tramontano | first14 = A | last15 = Devos | first15 = D | last16 = Blaschke | first16 = C | last17 = Valencia | first17 = A | last18 = Brett | first18 = D | last19 = Martin | first19 = D | last20 = Leroy | first20 = C | last21 = Rigoutsos | first21 = I | last22 = Sander | first22 = C | last23 = Ouzounis | first23 = C. A. | title = Evaluation of annotation strategies using an entire genome sequence | journal = Bioinformatics | volume = 19 | issue = 6 | pages = 717–26 |date=April 2003 | doi = 10.1093/bioinformatics/btg077| pmid = 12691983 | display-authors = 8 }}

External links

  • European Bioinformatics Institute databases
  • [https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Genome NCBI completely sequenced genomes]
  • Stanford Saccharomyces Genome Database
  • [https://www.ncbi.nlm.nih.gov/protein Protein], the NIH protein database, a collection of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB
{{Bioinformatics}}{{Use dmy dates|date=April 2017}}

1 : Biotechnology databases

随便看

 

开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。

 

Copyright © 2023 OENC.NET All Rights Reserved
京ICP备2021023879号 更新时间:2024/11/12 1:49:50