“Digital transcriptome subtraction”的意思、由来-开放百科全书

Using computational subtraction to discover novel pathogens was first proposed in 2002 by Meyerson et al.^[2] using human expressed sequence tag (EST) datasets. In a proof of principle experiment, Meyerson et al. demonstrated that it was a feasible approach using Epstein-Barr virus-infected lymphocytes in post-transplant lymphoproliferative disorder (PTLD).^[3]

In 2007, the term "Digital Transcriptome Subtraction" was coined by the Chang-Moore group,^[4] and was used to discover Merkel cell polymavirus in Merkel cell carcinoma.^[1]

Simultaneously to the MCV discovery, this approach was used to implicate a novel arenavirus as cause of fatality in a case where three patients died of similar illnesses shortly following organ transplantations from a single donor.^[5]

Method

Construction of cDNA library

After treatment with DNase I to eliminate human genomic DNA, total RNA is extracted from primary infected tissue. Messenger RNA is then purified using an oligo-dT column that binds to the poly-A tail, a signal specifically found on transcribed genes. Using random hexamers priming, reverse transcriptase (RT) convert all mRNA into cDNA and cloned into bacterial vectors. Bacteria, usually E. coli, are then transformed using the cDNA vectors and selected using a marker, the collection of transformed clones is the cDNA library. This generates a snap-shot of tissue mRNA that is stable and can be sequenced at a later stage.

Sequencing and quality control

The cDNA library must be sequenced to great depth (i.e. number of clones sequenced) in order to detect a theoretical rare pathogen sequence (Table 1), especially if the foreign sequence is novel. Chang-Moore recommend a sequencing depth of 200,000 transcripts or greater using multiple sequencing platforms.^[1]

Stringent quality control are then applied to the raw sequences to minimize false-positive results. The initial quality screen uses several general parameters to exclude ambiguous sequences, leaving behind a dataset of high-fidelity (Hi-Fi) reads.

BLAST to host genome

Using MEGABLAST, Hi-Fi reads are then matched to sequences in annotated databases and any positive matches are then subtracted from the dataset. Minimum hit length for a positive match of human sequence is typically 30 consecutive identical bases, which equates to a BLAST score of 60; generally, the remaining sequence is BLAST again with less stringent parameters to allow for slight mismatches (1 in 20 nucleotide). The vast majority of sequences (>99%) should be removed from the dataset at this stage.

Analysis of "non-host" candidates

Alignment to pathogen databases

After stringent rounds of subtraction, the remaining sequences are clustered into non-redundant contigs and aligned to known pathogen sequences using low-stringency parameters. As pathogen genomes mutates quickly, nucleotide-nucleotide alignments, or blastn, is usually uninformative as it is possible to have mutations at certain bases without changing the amino acid residue due to codon degeneracy. Matching the in silico translated protein sequences of all 6 open reading frames to the amino acid sequence to annotated proteins, or blastx, is the preferred alignment method as it increases the likelihood of identifying a novel pathogen by matching to a related strain/species.^[5] Experimental extension of candidate sequences might also be used at this stage to maximize chances of a positive match.^[6]

De novo assembly

In cases where alignment to known pathogens is uninformative or ambiguous, contigs of candidate sequence can be used as templates for primer walking in primary infected tissue to generate the complete pathogen genome sequence.^[1]^[5] As viral transcripts are exceedingly rare ratio tissue mRNA (10 transcripts in 1 million),^[1] it is unlikely to generate a transcriptome based on the original candidate sequences alone due to low coverage.

Validation of pathogen

Once a putative pathogen has been identified in the high-throughput sequencing data, it is imperative to validate the presence of pathogen in infected patients using more sensitive techniques, such as:

Applications

The primary application for DTS lies in identification of pathogenic viruses in cancer.^[1]^[4] It can also be used to identify viral pathogens in non-cancer related disease.^[5] Future clinical applications could include the use of DTS on a routine basis in individuals.

DTS could also apply to agriculture, identifying pathogens that have an effect on output. Computation subtraction was already used in a metagenomics study that associated viral infection by IAPV with colony collapse disorder in honey bees.^[7]

Advantages

Disadvantages

References

1. ^¹²³⁴⁵{{cite journal|vauthors=Feng H, Shuda M, Chang Y, Moore PS |title=Clonal integration of a polyomavirus in human Merkel cell carcinoma.|journal=Science|date=Jan 2008|volume=319|issue=5866|series=5866|pages=1096–1100|pmid=18202256|doi=10.1126/science.1152586|pmc=2740911}}
2. ^¹{{cite journal|vauthors=Weber G, Shendure J, Tanenbaum DM, Church GM, Meyerson M |title=Identification of foreign gene sequences by transcript filtering against the human genome.|journal=Nat Genet|date=Feb 2002|volume=30|issue=2|series=2|pages=141–142|pmid=11788827|doi=10.1038/ng818}}
3. ^¹{{cite journal|vauthors=Xu Y, Stange-Thomann N, Weber G, Bo R, Dodge S, David RG, Foley K, Beheshti J, Harris NL, Birren B, Lander ES, Meyerson M |title=Pathogen discovery from human tissue by sequence-based computational subtraction.|journal=Genomics|date=Mar 2003|volume=81|issue=3|series=3|pages=329–335|pmid=12659816|doi=10.1016/S0888-7543(02)00043-5}}
4. ^¹{{cite journal|vauthors=Feng H, Taylor JL, Benos PV, Newton R, Waddell K, Lucas SB, Chang Y, Moore PS |title=Human Transcriptome Subtraction by Using Short Sequence Tags To Search for Tumor Viruses in Conjunctival Carcinoma|journal=J Virol|date=August 2007|volume=81|issue=20|series=20|pages=11332–11340|pmid=17686852|doi=10.1128/JVI.00875-07|pmc=2045575}}
5. ^¹²³{{cite journal|vauthors=Palacios G, Druce J, Du L, Tran T, Birch C, Briese T, Conlan S, Quan PL, Hui J, Marshall J, Simons JF, Egholm M, Paddock CD, Shieh WJ, Goldsmith CS, Zaki SR, Catton M, Lipkin WI |title=A new arenavirus in a cluster of fatal transplant-associated diseases.|journal=N Engl J Med|date=Mar 2008|volume=358|issue=10|series=10|pages=991–998|pmid=18256387|doi=10.1056/NEJMoa073785|citeseerx=10.1.1.453.2859}}
6. ^{{cite web|vauthors=Chang Y, Moore PS |title=New Pathogen Discovery: Digital Transcriptome Subtraction|url=http://www.tumorvirology.pitt.edu/dts.html|accessdate=1 March 2012}}
7. ^{{cite journal|vauthors=Cox-Foster DL, Conlan S, Holmes EC, Palacios G, Evans JD, Moran NA, Quan PL, Briese T, Hornig M, Geiser DM, Martinson V, vanEngelsdorp D, Kalkstein AL, Drysdale A, Hui J, Zhai J, Cui L, Hutchison SK, Simons JF, Egholm M, Pettis JS, Lipkin WI |title=A metagenomic survey of microbes in honey bee colony collapse disorder.|journal=Science|date=Oct 2007|volume=318|issue=5848|series=5848|pages=283–287|pmid=17823314|doi=10.1126/science.1146498}}
8. ^¹²³{{cite journal|vauthors=MacConaill L, Meyerson M |title=Adding pathogens by genomic subtraction.|journal=Nat Genet|date=Apr 2008|volume=40|issue=4|series=4|pages=380–382|pmid=18368124|doi=10.1038/ng0408-380}}

History