“RNA-Seq”的意思、由来-开放百科全书

RNA-Seq is used to analyze the continuously changing cellular transcriptome. Specifically, RNA-Seq facilitates the ability to look at alternative gene spliced transcripts, post-transcriptional modifications, gene fusion, mutations/SNPs and changes in gene expression over time, or differences in gene expression in different groups or treatments.^[5] In addition to mRNA transcripts, RNA-Seq can look at different populations of RNA to include total RNA, small RNA, such as miRNA, tRNA, and ribosomal profiling.^[4] RNA-Seq can also be used to determine exon/intron boundaries and verify or amend previously annotated 5' and 3' gene boundaries. Recent advances in RNA-seq include single cell sequencing and in situ sequencing of fixed tissue.^[5]

Prior to RNA-Seq, gene expression studies were done with hybridization-based microarrays. Issues with microarrays include cross-hybridization artifacts, poor quantification of lowly and highly expressed genes, and needing to know the sequence a priori.^[6] Because of these technical issues, transcriptomics transitioned to sequencing-based methods. These progressed from Sanger sequencing of Expressed Sequence Tag libraries, to chemical tag-based methods (e.g., serial analysis of gene expression), and finally to the current technology, next-gen sequencing of cDNA (notably RNA-Seq).

Methods

Library preparation

The general steps to prepare a complementary DNA (cDNA) library for sequencing are described below, but often vary between platforms.^[7]^[8]^[9]

Small RNA/non-coding RNA sequencing

When sequencing RNA other than mRNA, the library preparation is modified. The cellular RNA is selected based on the desired size range. For small RNA targets, such as miRNA, the RNA is isolated through size selection. This can be performed with a size exclusion gel, through size selection magnetic beads, or with a commercially developed kit. Once isolated, linkers are added to the 3' and 5' end then purified. The final step is cDNA generation through reverse transcription.

Direct RNA sequencing

As converting RNA into cDNA using reverse transcriptase has been shown to introduce biases and artifacts that may interfere with both the proper characterization and quantification of transcripts,^[12] single molecule Direct RNA Sequencing (DRSTM) technology was under development by Helicos (now bankrupt). DRSTM sequences RNA molecules directly in a massively-parallel manner without RNA conversion to cDNA or other biasing sample manipulations such as ligation and amplification.

Experimental considerations

A variety of parameters are considered when designing and conducting RNA-Seq experiments:

Analysis

Transcriptome assembly

Two methods are used to assign raw sequence reads to genomic features (i.e., assemble the transcriptome):

Gene expression

Expression is quantified to study cellular changes in response to external stimuli, differences between healthy and diseased states, and other research questions. Gene expression is often used as a proxy for protein abundance, but these are often not equivalent due to post transcriptional events such as RNA interference and nonsense-mediated decay.^[34]

Expression is quantified by counting the number of reads that mapped to each locus in the transcriptome assembly step. Expression can be quantified for exons or genes using contigs or reference transcript annotations.^[7] These observed RNA-Seq read counts have been robustly validated against older technologies, including expression microarrays and qPCR.^[35]^[36] Tools that quantify counts are HTSeq,^[37] FeatureCounts,^[38] Rcount,^[39] maxcounts,^[40] FIXSEQ,^[41] and Cuffquant. The read counts are then converted into appropriate metrics for hypothesis testing, regressions, and other analyses. Parameters for this conversion are:

Differential expression and absolute quantification of transcripts

RNA-Seq is generally used to compare gene expression between conditions, such as a drug treatment vs non-treated, and find out which genes are up- or down-regulated in each condition. In principle, RNA-Seq will make it possible to account for all the transcripts in the cell for each condition. Differently expressed genes can be identified using tools that count the sequencing reads per gene and compare them between samples. Many packages are available for this type of analysis;^[49] some of the most commonly used tools are DESeq^[50] and edgeR,^[51] packages from Bioconductor.^[52]^[53] Both these tools use a model based on the negative binomial distribution.^[50]^[51]

Notably, in a recent comparison of 11 RNAseq differential expression packages, TMM normalization and the normalization provided by the DEseq package were the only two count normalization methods that showed satisfactory results with respect to all metrics used in the evaluation. DEseq can be a useful package for normalizing counts.^[54]^[55]

It is not possible to do absolute quantification using the common RNA-Seq pipeline, because it only provides RNA levels relative to all transcripts. If the total amount of RNA in the cell changes between conditions, relative normalization will misrepresent the changes for individual transcripts. Absolute quantification of mRNAs is possible by performing RNA-Seq with added spike ins, samples of RNA at known concentrations. After sequencing, the read count of the spike in sequences is used to determine the direct correspondence between read count and biological fragments.^[10]^[56] In developmental studies, this technique has been used in Xenopus tropicalis embryos at a high temporal resolution, to determine transcription kinetics.^[57]

Coexpression networks

Coexpression networks are data-derived representations of genes behaving in a similar way across tissues and experimental conditions.^[58] Their main purpose lies in hypothesis generation and guilt-by-association approaches for inferring functions of previously unknown genes.^[58] RNASeq data has been recently used to infer genes involved in specific pathways based on Pearson correlation, both in plants ^[59] and mammals.^[60] The main advantage of RNASeq data in this kind of analysis over the microarray platforms is the capability to cover the entire transcriptome, therefore allowing the possibility to unravel more complete representations of the gene regulatory networks. Differential regulation of the splice isoforms of the same gene can be detected and used to predict and their biological functions.^[61]^[62]

Single nucleotide variation discovery

Transcriptome single nucleotide variation has been analyzed in maize on the Roche 454 sequencing platform.^[64] Directly from the transcriptome analysis, around 7000 single nucleotide polymorphisms (SNPs) were recognized. Following Sanger sequence validation, the researchers were able to conservatively obtain almost 5000 valid SNPs covering more than 2400 maize genes. RNA-seq is limited to transcribed regions however, since it will only discover sequence variations in exon regions. This misses many subtle but important intron alleles that affect disease such as transcription regulators, leaving analysis to only large effectors. While some correlation exists between exon to intron variation, only whole genome sequencing would be able to capture the source of all relevant SNPs.^[65]

The only way to be absolutely sure of the individual's mutations is to compare the transcriptome sequences to the germline DNA sequence. This enables the distinction of homozygous genes versus skewed expression of one of the alleles and it can also provide information about genes that were not expressed in the transcriptomic experiment. An R-based statistical package known as CummeRbund^[66] can be used to generate expression comparison charts for visual analysis.

RNA editing (post-transcriptional alterations)

Having the matching genomic and transcriptomic sequences of an individual can also help in detecting post-transcriptional edits,^[8] where, if the individual is homozygous for a gene, but the gene's transcript has a different allele, then a post-transcriptional modification event is determined.

mRNA centric single nucleotide variants (SNVs) are generally not considered as a representative source of functional variation in cells, mainly due to the fact that these mutations disappear with the mRNA molecule, however the fact that efficient DNA correction mechanisms do not apply to RNA molecules can cause them to appear more often. This has been proposed as the source of certain prion diseases,^[67] also known as TSE or transmissible spongiform encephalopathies.

Fusion gene detection

Caused by different structural modifications in the genome, fusion genes have gained attention because of their relationship with cancer.^[68] The ability of RNA-seq to analyze a sample's whole transcriptome in an unbiased fashion makes it an attractive tool to find these kinds of common events in cancer.^[69]

The idea follows from the process of aligning the short transcriptomic reads to a reference genome. Most of the short reads will fall within one complete exon, and a smaller but still large set would be expected to map to known exon-exon junctions. The remaining unmapped short reads would then be further analyzed to determine whether they match an exon-exon junction where the exons come from different genes. This would be evidence of a possible fusion event, however, because of the length of the reads, this could prove to be very noisy. An alternative approach is to use pair-end reads, when a potentially large number of paired reads would map each end to a different exon, giving better coverage of these events (see figure). Nonetheless, the end result consists of multiple and potentially novel combinations of genes providing an ideal starting point for further validation.

Application to genomic medicine

History

The past five years have seen a flourishing of NGS-based methods for genome analysis leading to the discovery of a number of new mutations and fusion transcripts in cancer. RNA-Seq data could help researchers interpreting the "personalized transcriptome" so that it will help understanding the transcriptomic changes happening therefore, ideally, identifying gene drivers for a disease. The feasibility of this approach is however dictated by the costs in terms of money and time.

A basic search on PubMed reveals that the term RNA Seq, queried as ""RNA Seq" OR "RNA-Seq" OR "RNA sequencing" OR "RNASeq"" in order to capture the most common ways of phrasing it, gives 5,425 hits demonstrating usage statistics of this technology. A few examples will be taken into consideration to explain that RNA-Seq applications to the clinic have the potentials to significantly affect patient's life and, on the other hand, requires a team of specialists (bioinformaticians, physicians/clinicians, basic researchers, technicians) to fully interpret the huge amount of data generated by this analysis.

As an example of clinical applications, researchers at the Mayo Clinic used an RNA-Seq approach to identify differentially expressed transcripts between oral cancer and normal tissue samples. They also accurately evaluated the allelic imbalance (AI), ratio of the transcripts produced by the single alleles, within a subgroup of genes involved in cell differentiation, adhesion, cell motility and muscle contraction^[70] identifying a unique transcriptomic and genomic signature in oral cancer patients. Novel insight on skin cancer (melanoma) also come from RNA-Seq of melanoma patients. This approach led to the identification of eleven novel gene fusion transcripts originated from previously unknown chromosomal rearrangements. Twelve novel chimeric transcripts were also reported, including seven of those that confirmed previously identified data in multiple melanoma samples.^[71] Furthermore, this approach is not limited to cancer patients. RNA-Seq has been used to study other important chronic diseases such as Alzheimer (AD) and diabetes. In the former case, Twine and colleagues compared the transcriptome of different lobes of deceased AD's patient's brain with the brain of healthy individuals identifying a lower number of splice variants in AD's patients and differential promoter usage of the APOE-001 and -002 isoforms in AD's brains.^[72] In the latter case, different groups showed the unicity of the beta-cells transcriptome in diabetic patients in terms of transcripts accumulation and differential promoter usage^[73] and long non coding RNAs (lncRNAs) signature.^[74]

Compared with microarrays, NGS technology has identified novel and low frequency RNAs associated with disease processes. This advantage aids in the diagnosis and possible future treatments of diseases, including cancer. For example, NGS technology identified several previously undocumented differentially-expressed transcripts in rats treated with AFB1, a potent hepatocarcinogen. Nearly 50 new differentially-expressed transcripts were identified between the controls and AFB1-treated rats. Additionally potential new exons were identified, including some that are responsive to AFB1. The next-generation sequencing pipeline identified more differential gene expressions compared with microarrays, particularly when DESeq software was utilized. Cufflinks identified two novel transcripts that were not previously annotated in the Ensembl database; these transcripts were confirmed using PCR-cloning.^[75] A followup study identified twenty-five, unannotated AFB1 transcripts from RNA-Seq as long noncoding RNAs.^[76] Numerous other studies have demonstrated NGS's ability to detect aberrant mRNA and small non-coding RNA expression in disease processes above that provided by microarrays. The lower cost and higher throughput offered by NGS confers another advantage to researchers.

The role of small non-coding RNAs in disease processes has also been explored in recent years. For example, Han et al. (2011) examined microRNA expression differences in bladder cancer patients in order to understand how changes and dysregulation in microRNA can influence mRNA expression and function. Several microRNAs were differentially expressed in the bladder cancer patients. Upregulation in the aberrant microRNAs was more common than downregulation in the cancer patients. One of the upregulated microRNAs, has-miR-96, has been associated with carcinogenesis, and several of the overexpressed microRNAs have also been observed in other cancers, including ovarian and cervical. Some of the downregulated microRNAs in cancer samples were hypothesized to have inhibitory roles.^[77]

ENCODE and TCGA

A lot of emphasis has been given to RNA-Seq data after the Encyclopedia of DNA Elements (ENCODE) and The Cancer Genome Atlas (TCGA) projects have used this approach to characterize dozens of cell lines^[78] and thousands of primary tumor samples,^[79] respectively. ENCODE aimed to identify genome-wide regulatory regions in different cohort of cell lines and transcriptomic data are paramount in order to understand the downstream effect of those epigenetic and genetic regulatory layers. TCGA, instead, aimed to collect and analyze thousands of patient's samples from 30 different tumor types in order to understand the underlying mechanisms of malignant transformation and progression. In this context RNA-Seq data provide a unique snapshot of the transcriptomic status of the disease and look at an unbiased population of transcripts that allows the identification of novel transcripts, fusion transcripts and non-coding RNAs that could be undetected with different technologies.

See also

References

1. ^{{Cite journal |last=Shafee |first=Thomas |last2=Lowe |first2=Rohan | name-list-format = vanc |date=2017|title=Eukaryotic and prokaryotic gene structure |journal=WikiJournal of Medicine|language=en|volume=4|issue=1|doi=10.15347/wjm/2017.002 }}
2. ^¹{{cite journal | vauthors = Morin R, Bainbridge M, Fejes A, Hirst M, Krzywinski M, Pugh T, McDonald H, Varhol R, Jones S, Marra M | title = Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing | journal = BioTechniques | volume = 45 | issue = 1 | pages = 81–94 | date = July 2008 | pmid = 18611170 | doi = 10.2144/000112900 | url = http://www.bcgsc.ca/about/pubann/biotechniques-publication-2008-44-8 }}
3. ^{{cite journal | vauthors = Chu Y, Corey DR | title = RNA sequencing: platform selection, experimental design, and data interpretation | journal = Nucleic Acid Therapeutics | volume = 22 | issue = 4 | pages = 271–4 | date = August 2012 | pmid = 22830413 | pmc = 3426205 | doi = 10.1089/nat.2012.0367 }}
4. ^{{cite journal | vauthors = Ingolia NT, Brar GA, Rouskin S, McGeachy AM, Weissman JS | title = The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments | journal = Nature Protocols | volume = 7 | issue = 8 | pages = 1534–50 | date = July 2012 | pmid = 22836135 | pmc = 3535016 | doi = 10.1038/nprot.2012.086 }}
5. ^{{cite journal | vauthors = Lee JH, Daugharthy ER, Scheiman J, Kalhor R, Yang JL, Ferrante TC, Terry R, Jeanty SS, Li C, Amamoto R, Peters DT, Turczyk BM, Marblestone AH, Inverso SA, Bernard A, Mali P, Rios X, Aach J, Church GM | display-authors = 6 | title = Highly multiplexed subcellular RNA sequencing in situ | journal = Science | volume = 343 | issue = 6177 | pages = 1360–3 | date = March 2014 | pmid = 24578530 | pmc = 4140943 | doi = 10.1126/science.1250212 | bibcode = 2014Sci...343.1360L }}
6. ^{{cite journal | vauthors = Kukurba KR, Montgomery SB | title = RNA Sequencing and Analysis | journal = Cold Spring Harbor Protocols | volume = 2015 | issue = 11 | pages = 951–69 | date = April 2015 | pmid = 25870306 | pmc = 4863231 | doi = 10.1101/pdb.top084970 }}
7. ^¹²{{cite journal | vauthors = Griffith M, Walker JR, Spies NC, Ainscough BJ, Griffith OL | title = Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud | journal = PLoS Computational Biology | volume = 11 | issue = 8 | pages = e1004393 | date = August 2015 | pmid = 26248053 | doi = 10.1371/journal.pcbi.1004393 | pmc=4527835| bibcode = 2015PLSCB..11E4393G }}
8. ^¹²{{cite journal | vauthors = Wang Z, Gerstein M, Snyder M | title = RNA-Seq: a revolutionary tool for transcriptomics | journal = Nature Reviews Genetics | volume = 10 | issue = 1 | pages = 57–63 | date = January 2009 | pmid = 19015660 | pmc = 2949280 | doi = 10.1038/nrg2484 }}
9. ^{{Cite web |url= http://rnaseq.uoregon.edu/ |title=RNA-seqlopedia |website=rnaseq.uoregon.edu |access-date=2017-02-08}}
10. ^¹²{{cite journal | vauthors = Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B | title = Mapping and quantifying mammalian transcriptomes by RNA-Seq | journal = Nature Methods | volume = 5 | issue = 7 | pages = 621–8 | date = July 2008 | pmid = 18516045 | doi = 10.1038/nmeth.1226 }}
11. ^{{cite journal | vauthors = Chen EA, Souaiaia T, Herstein JS, Evgrafov OV, Spitsyna VN, Rebolini DF, Knowles JA | title = Effect of RNA integrity on uniquely mapped reads in RNA-Seq | journal = BMC Research Notes | volume = 7 | issue = 1 | pages = 753 | date = October 2014 | pmid = 25339126 | doi = 10.1186/1756-0500-7-753 | pmc=4213542}}
12. ^{{cite journal | vauthors = Liu D, Graber JH | title = Quantitative comparison of EST libraries requires compensation for systematic biases in cDNA generation | journal = BMC Bioinformatics | volume = 7 | issue = | pages = 77 | date = February 2006 | pmid = 16503995 | pmc = 1431573 | doi = 10.1186/1471-2105-7-77 }}
13. ^{{cite journal | vauthors = Stegle O, Parts L, Piipari M, Winn J, Durbin R | title = Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses | journal = Nature Protocols | volume = 7 | issue = 3 | pages = 500–7 | date = February 2012 | pmid = 22343431 | pmc = 3398141 | doi = 10.1038/nprot.2011.457 }}
14. ^{{cite journal | vauthors = Kingsford C, Patro R | title = Reference-based compression of short-read sequences using path encoding | journal = Bioinformatics | volume = 31 | issue = 12 | pages = 1920–8 | date = June 2015 | pmid = 25649622 | pmc = 4481695 | doi = 10.1093/bioinformatics/btv071 }}
15. ^¹{{cite journal | vauthors = Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A | title = Full-length transcriptome assembly from RNA-Seq data without a reference genome | journal = Nature Biotechnology | volume = 29 | issue = 7 | pages = 644–52 | date = May 2011 | pmid = 21572440 | pmc = 3571712 | doi = 10.1038/nbt.1883 }}
16. ^{{cite web|title=De Novo Assembly Using Illumina Reads |url= http://www.illumina.com/Documents/products/technotes/technote_denovo_assembly_ecoli.pdf |accessdate=22 October 2016 }}
17. ^{{cite journal | vauthors = Zerbino DR, Birney E | title = Velvet: algorithms for de novo short read assembly using de Bruijn graphs | journal = Genome Research | volume = 18 | issue = 5 | pages = 821–9 | date = May 2008 | pmid = 18349386 | pmc = 2336801 | doi = 10.1101/gr.074492.107 }}
18. ^Oases: a transcriptome assembler for very short reads
19. ^{{cite journal | vauthors = Chang Z, Li G, Liu J, Zhang Y, Ashby C, Liu D, Cramer CL, Huang X | title = Bridger: a new framework for de novo transcriptome assembly using RNA-seq data | journal = Genome Biology | volume = 16 | issue = 1 | pages = 30 | date = February 2015 | pmid = 25723335 | pmc = 4342890 | doi = 10.1186/s13059-015-0596-2 }}
20. ^¹{{cite journal | vauthors = Li B, Fillmore N, Bai Y, Collins M, Thomson JA, Stewart R, Dewey CN | title = Evaluation of de novo transcriptome assemblies from RNA-Seq data | journal = Genome Biology | volume = 15 | issue = 12 | pages = 553 | date = December 2014 | pmid = 25608678 | doi = 10.1186/s13059-014-0553-5 | pmc=4298084}}
21. ^¹{{cite journal | vauthors = Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR | title = STAR: ultrafast universal RNA-seq aligner | journal = Bioinformatics | volume = 29 | issue = 1 | pages = 15–21 | date = January 2013 | pmid = 23104886 | pmc = 3530905 | doi = 10.1093/bioinformatics/bts635 }}
22. ^{{cite journal | vauthors = Langmead B, Trapnell C, Pop M, Salzberg SL | title = Ultrafast and memory-efficient alignment of short DNA sequences to the human genome | journal = Genome Biology | volume = 10 | issue = 3 | pages = R25 | date = 2009 | pmid = 19261174 | pmc = 2690996 | doi = 10.1186/gb-2009-10-3-r25 }}
23. ^{{cite journal | vauthors = Trapnell C, Pachter L, Salzberg SL | title = TopHat: discovering splice junctions with RNA-Seq | journal = Bioinformatics | volume = 25 | issue = 9 | pages = 1105–11 | date = May 2009 | pmid = 19289445 | pmc = 2672628 | doi = 10.1093/bioinformatics/btp120 }}
24. ^{{cite journal | vauthors = Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L | title = Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks | journal = Nature Protocols | volume = 7 | issue = 3 | pages = 562–78 | date = March 2012 | pmid = 22383036 | pmc = 3334321 | doi = 10.1038/nprot.2012.016 }}
25. ^{{cite journal | vauthors = Liao Y, Smyth GK, Shi W | title = The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote | journal = Nucleic Acids Research | volume = 41 | issue = 10 | pages = e108 | date = May 2013 | pmid = 23558742 | pmc = 3664803 | doi = 10.1093/nar/gkt214 }}
26. ^{{cite journal |last1=Kim |first1=D |last2=Langmead |first2=B |last3=Salzberg |first3=SL |title=HISAT: a fast spliced aligner with low memory requirements. |journal=Nature Methods |date=April 2015 |volume=12 |issue=4 |pages=357–60 |doi=10.1038/nmeth.3317 |pmid=25751142 |pmc=4655817 }}
27. ^{{cite journal | vauthors = Patro R, Mount SM, Kingsford C | title = Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms | journal = Nature Biotechnology | volume = 32 | issue = 5 | pages = 462–4 | date = May 2014 | pmid = 24752080 | pmc = 4077321 | doi = 10.1038/nbt.2862 | arxiv = 1308.3700 }}
28. ^{{cite journal | vauthors = Bray NL, Pimentel H, Melsted P, Pachter L | title = Near-optimal probabilistic RNA-seq quantification | journal = Nature Biotechnology | volume = 34 | issue = 5 | pages = 525–7 | date = May 2016 | pmid = 27043002 | doi = 10.1038/nbt.3519 }}
29. ^{{cite journal | vauthors = Wu TD, Watanabe CK | title = GMAP: a genomic mapping and alignment program for mRNA and EST sequences | journal = Bioinformatics | volume = 21 | issue = 9 | pages = 1859–75 | date = May 2005 | pmid = 15728110 | doi = 10.1093/bioinformatics/bti310 }}
30. ^{{cite journal | vauthors = Baruzzo G, Hayer KE, Kim EJ, Di Camillo B, FitzGerald GA, Grant GR | title = Simulation-based comprehensive benchmarking of RNA-seq aligners | language = En | journal = Nature Methods | volume = 14 | issue = 2 | pages = 135–139 | date = February 2017 | pmid = 27941783 | pmc = 5792058 | doi = 10.1038/nmeth.4106 }}
31. ^{{cite journal | vauthors = Engström PG, Steijger T, Sipos B, Grant GR, Kahles A, Rätsch G, Goldman N, Hubbard TJ, Harrow J, Guigó R, Bertone P | display-authors = 6 | title = Systematic evaluation of spliced alignment programs for RNA-seq data | language = En | journal = Nature Methods | volume = 10 | issue = 12 | pages = 1185–91 | date = December 2013 | pmid = 24185836 | pmc = 4018468 | doi = 10.1038/nmeth.2722 }}
32. ^{{cite journal | vauthors = Lu B, Zeng Z, Shi T | title = Comparative study of de novo assembly and genome-guided assembly strategies for transcriptome reconstruction based on RNA-Seq | journal = Science China Life Sciences | volume = 56 | issue = 2 | pages = 143–55 | date = February 2013 | pmid = 23393030 | doi = 10.1007/s11427-013-4442-z }}
33. ^{{cite journal | vauthors = Bradnam KR, Fass JN, Alexandrov A, Baranay P, Bechner M, Birol I, Boisvert S, Chapman JA, Chapuis G, Chikhi R, Chitsaz H, Chou WC, Corbeil J, Del Fabbro C, Docking TR, Durbin R, Earl D, Emrich S, Fedotov P, Fonseca NA, Ganapathy G, Gibbs RA, Gnerre S, Godzaridis E, Goldstein S, Haimel M, Hall G, Haussler D, Hiatt JB, Ho IY, Howard J, Hunt M, Jackman SD, Jaffe DB, Jarvis ED, Jiang H, Kazakov S, Kersey PJ, Kitzman JO, Knight JR, Koren S, Lam TW, Lavenier D, Laviolette F, Li Y, Li Z, Liu B, Liu Y, Luo R, Maccallum I, Macmanes MD, Maillet N, Melnikov S, Naquin D, Ning Z, Otto TD, Paten B, Paulo OS, Phillippy AM, Pina-Martins F, Place M, Przybylski D, Qin X, Qu C, Ribeiro FJ, Richards S, Rokhsar DS, Ruby JG, Scalabrin S, Schatz MC, Schwartz DC, Sergushichev A, Sharpe T, Shaw TI, Shendure J, Shi Y, Simpson JT, Song H, Tsarev F, Vezzi F, Vicedomini R, Vieira BM, Wang J, Worley KC, Yin S, Yiu SM, Yuan J, Zhang G, Zhang H, Zhou S, Korf IF | display-authors = 6 | title = Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species | journal = GigaScience | volume = 2 | issue = 1 | pages = 10 | date = July 2013 | pmid = 23870653 | pmc = 3844414 | doi = 10.1186/2047-217X-2-10 }}
34. ^{{cite journal | vauthors = Greenbaum D, Colangelo C, Williams K, Gerstein M | title = Comparing protein abundance and mRNA expression levels on a genomic scale | journal = Genome Biology | volume = 4 | issue = 9 | pages = 117 | year = 2003 | pmid = 12952525 | pmc = 193646 | doi = 10.1186/gb-2003-4-9-117 }}
35. ^¹{{cite journal | vauthors = Li H, Lovci MT, Kwon YS, Rosenfeld MG, Fu XD, Yeo GW | title = Determination of tag density required for digital transcriptome analysis: application to an androgen-sensitive prostate cancer model | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 105 | issue = 51 | pages = 20179–84 | date = December 2008 | pmid = 19088194 | pmc = 2603435 | doi = 10.1073/pnas.0807121105 | bibcode = 2008PNAS..10520179L }}
36. ^{{cite journal | vauthors = Zhang ZH, Jhaveri DJ, Marshall VM, Bauer DC, Edson J, Narayanan RK, Robinson GJ, Lundberg AE, Bartlett PF, Wray NR, Zhao QY | title = A comparative study of techniques for differential expression analysis on RNA-Seq data | journal = PLOS One | volume = 9 | issue = 8 | pages = e103207 | date = August 2014 | pmid = 25119138 | doi = 10.1371/journal.pone.0103207 | pmc=4132098| bibcode = 2014PLoSO...9j3207Z }}
37. ^{{cite journal | vauthors = Anders S, Pyl PT, Huber W | title = HTSeq--a Python framework to work with high-throughput sequencing data | journal = Bioinformatics | volume = 31 | issue = 2 | pages = 166–9 | date = January 2015 | pmid = 25260700 | pmc = 4287950 | doi = 10.1093/bioinformatics/btu638 }}
38. ^{{cite journal | vauthors = Liao Y, Smyth GK, Shi W | title = featureCounts: an efficient general purpose program for assigning sequence reads to genomic features | journal = Bioinformatics | volume = 30 | issue = 7 | pages = 923–30 | date = April 2014 | pmid = 24227677 | doi = 10.1093/bioinformatics/btt656 | arxiv = 1305.3347 }}
39. ^{{cite journal | vauthors = Schmid MW, Grossniklaus U | title = Rcount: simple and flexible RNA-Seq read counting | journal = Bioinformatics | volume = 31 | issue = 3 | pages = 436–7 | date = February 2015 | pmid = 25322836 | doi = 10.1093/bioinformatics/btu680 }}
40. ^{{cite journal | vauthors = Finotello F, Lavezzo E, Bianco L, Barzon L, Mazzon P, Fontana P, Toppo S, Di Camillo B | title = Reducing bias in RNA sequencing data: a novel approach to compute counts | journal = BMC Bioinformatics | volume = 15 Suppl 1 | pages = S7 | date = 2014 | pmid = 24564404 | pmc = 4016203 | doi = 10.1186/1471-2105-15-s1-s7 }}
41. ^{{cite journal | vauthors = Hashimoto TB, Edwards MD, Gifford DK | title = Universal count correction for high-throughput sequencing | journal = PLoS Computational Biology | volume = 10 | issue = 3 | pages = e1003494 | date = March 2014 | pmid = 24603409 | pmc = 3945112 | doi = 10.1371/journal.pcbi.1003494 | bibcode = 2014PLSCB..10E3494H }}
42. ^¹{{cite journal | vauthors = Robinson MD, Oshlack A | title = A scaling normalization method for differential expression analysis of RNA-seq data | journal = Genome Biology | volume = 11 | issue = 3 | pages = R25 | date = 2010 | pmid = 20196867 | pmc = 2864565 | doi = 10.1186/gb-2010-11-3-r25 }}
43. ^{{cite journal | vauthors = Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L | title = Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation | journal = Nature Biotechnology | volume = 28 | issue = 5 | pages = 511–5 | date = May 2010 | pmid = 20436464 | pmc = 3146043 | doi = 10.1038/nbt.1621 | author9-link = Lior Pachter }}
44. ^{{cite arxiv|last1=Pachter|first1=Lior | name-list-format = vanc |title=Models for transcript quantification from RNA-Seq|eprint=1104.3889|date=19 April 2011|class=q-bio.GN}}
45. ^{{cite web|title=What the FPKM? A review of RNA-Seq expression units|url=https://haroldpimentel.wordpress.com/2014/05/08/what-the-fpkm-a-review-rna-seq-expression-units/|website=The farrago|access-date=28 March 2018|date=8 May 2014}}
46. ^{{cite journal | vauthors = Wagner GP, Kin K, Lynch VJ | title = Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples | journal = Theory in Biosciences = Theorie in den Biowissenschaften | volume = 131 | issue = 4 | pages = 281–5 | date = December 2012 | pmid = 22872506 | doi = 10.1007/s12064-012-0162-3 }}
47. ^{{cite journal | vauthors = Law CW, Chen Y, Shi W, Smyth GK | title = voom: Precision weights unlock linear model analysis tools for RNA-seq read counts | journal = Genome Biology | volume = 15 | issue = 2 | pages = R29 | date = February 2014 | pmid = 24485249 | pmc = 4053721 | doi = 10.1186/gb-2014-15-2-r29 }}
48. ^{{cite journal | vauthors = Anders S, Huber W | title = Differential expression analysis for sequence count data | journal = Genome Biology | volume = 11 | issue = 10 | pages = R106 | date = 2010 | pmid = 20979621 | pmc = 3218662 | doi = 10.1186/gb-2010-11-10-r106 }}
49. ^{{cite journal | vauthors = Soneson C, Delorenzi M | title = A comparison of methods for differential expression analysis of RNA-seq data | journal = BMC Bioinformatics | volume = 14 | pages = 91 | date = March 2013 | pmid = 23497356 | pmc = 3608160 | doi = 10.1186/1471-2105-14-91 }}
50. ^¹{{cite journal | vauthors = Anders S, Huber W | title = Differential expression analysis for sequence count data | journal = Genome Biology | volume = 11 | issue = 10 | pages = R106 | date = 2010-01-01 | pmid = 20979621 | pmc = 3218662 | doi = 10.1186/gb-2010-11-10-r106 }}
51. ^¹²{{cite journal | vauthors = Robinson MD, McCarthy DJ, Smyth GK | title = edgeR: a Bioconductor package for differential expression analysis of digital gene expression data | journal = Bioinformatics | volume = 26 | issue = 1 | pages = 139–40 | date = January 2010 | pmid = 19910308 | pmc = 2796818 | doi = 10.1093/bioinformatics/btp616 }}
52. ^{{cite web | url = http://www.bioconductor.org | title = Bioconductor - Open source software for bioinformatics}}
53. ^{{cite journal | vauthors = Huber W, Carey VJ, Gentleman R, Anders S, Carlson M, Carvalho BS, Bravo HC, Davis S, Gatto L, Girke T, Gottardo R, Hahne F, Hansen KD, Irizarry RA, Lawrence M, Love MI, MacDonald J, Obenchain V, Oleś AK, Pagès H, Reyes A, Shannon P, Smyth GK, Tenenbaum D, Waldron L, Morgan M | display-authors = 6 | title = Orchestrating high-throughput genomic analysis with Bioconductor | journal = Nature Methods | volume = 12 | issue = 2 | pages = 115–21 | date = February 2015 | pmid = 25633503 | pmc = 4509590 | doi = 10.1038/nmeth.3252 }}
54. ^{{cite journal | title = A comparison of methods for differential expression analysis of RNA-seq data| journal = BMC Bioinformatics| volume = 14| pages = 91| doi = 10.1186/1471-2105-14-91| pmid = 23497356| pmc = 3608160|year = 2013|last1 = Soneson|first1 = Charlotte| last2 = Delorenzi| first2 = Mauro}}
55. ^{{cite journal | vauthors = Soneson C, Delorenzi M| title = A comparison of methods for differential expression analysis of RNA-seq data | journal = BMC Bioinformatics | volume = 14 | issue = 91 | pages = 91 | date = 1 March 2013 | pmid = 23497356 | pmc = 4509590 | doi = 10.1186/1471-2105-14-91 }}
56. ^{{cite journal | vauthors = Marguerat S, Schmidt A, Codlin S, Chen W, Aebersold R, Bähler J | title = Quantitative analysis of fission yeast transcriptomes and proteomes in proliferating and quiescent cells | journal = Cell | volume = 151 | issue = 3 | pages = 671–83 | date = October 2012 | pmid = 23101633 | pmc = 3482660 | doi = 10.1016/j.cell.2012.09.019 }}
57. ^{{cite journal | vauthors = Owens ND, Blitz IL, Lane MA, Patrushev I, Overton JD, Gilchrist MJ, Cho KW, Khokha MK | title = Measuring Absolute RNA Copy Numbers at High Temporal Resolution Reveals Transcriptome Kinetics in Development | journal = Cell Reports | volume = 14 | issue = 3 | pages = 632–647 | date = January 2016 | pmid = 26774488 | pmc = 4731879 | doi = 10.1016/j.celrep.2015.12.050 }}
58. ^¹{{cite journal | vauthors = Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D | title = A combined algorithm for genome-wide prediction of protein function | journal = Nature | volume = 402 | issue = 6757 | pages = 83–6 | date = November 1999 | pmid = 10573421 | doi = 10.1038/47048 | bibcode = 1999Natur.402...83M }}
59. ^¹{{cite journal | vauthors = Giorgi FM, Del Fabbro C, Licausi F | title = Comparative study of RNA-seq- and microarray-derived coexpression networks in Arabidopsis thaliana | journal = Bioinformatics | volume = 29 | issue = 6 | pages = 717–24 | date = March 2013 | pmid = 23376351 | doi = 10.1093/bioinformatics/btt053 }}
60. ^{{cite journal | vauthors = Iancu OD, Kawane S, Bottomly D, Searles R, Hitzemann R, McWeeney S | title = Utilizing RNA-Seq data for de novo coexpression network inference | journal = Bioinformatics | volume = 28 | issue = 12 | pages = 1592–7 | date = June 2012 | pmid = 22556371 | pmc = 3493127 | doi = 10.1093/bioinformatics/bts245 }}
61. ^{{cite journal | vauthors = Eksi R, Li HD, Menon R, Wen Y, Omenn GS, Kretzler M, Guan Y | title = Systematically differentiating functions for alternatively spliced isoforms through integrating RNA-seq data | journal = PLoS Computational Biology | volume = 9 | issue = 11 | pages = e1003314 | date = Nov 2013 | pmid = 24244129 | pmc = 3820534 | doi = 10.1371/journal.pcbi.1003314 | bibcode = 2013PLSCB...9E3314E }}
62. ^{{cite journal | vauthors = Li HD, Menon R, Omenn GS, Guan Y | title = The emerging era of genomic data integration for analyzing splice isoform function | journal = Trends in Genetics | volume = 30 | issue = 8 | pages = 340–7 | date = August 2014 | pmid = 24951248 | doi = 10.1016/j.tig.2014.05.005 | pmc=4112133}}
63. ^{{cite journal | vauthors = Foroushani A, Agrahari R, Docking R, Chang L, Duns G, Hudoba M, Karsan A, Zare H | title = Large-scale gene network analysis reveals the significance of extracellular matrix pathway and homeobox genes in acute myeloid leukemia: an introduction to the Pigengene package and its applications | journal = BMC Medical Genomics | volume = 10 | issue = 1 | pages = 16 | date = March 2017 | pmid = 28298217 | pmc = 5353782 | doi = 10.1186/s12920-017-0253-6 }}
64. ^{{cite journal | vauthors = Barbazuk WB, Emrich SJ, Chen HD, Li L, Schnable PS | title = SNP discovery via 454 transcriptome sequencing | journal = The Plant Journal | volume = 51 | issue = 5 | pages = 910–8 | date = September 2007 | pmid = 17662031 | pmc = 2169515 | doi = 10.1111/j.1365-313X.2007.03193.x }}
65. ^{{cite journal | vauthors = Lalonde E, Ha KC, Wang Z, Bemmo A, Kleinman CL, Kwan T, Pastinen T, Majewski J | title = RNA sequencing reveals the role of splicing polymorphisms in regulating human gene expression | journal = Genome Research | volume = 21 | issue = 4 | pages = 545–54 | date = April 2011 | pmid = 21173033 | pmc = 3065702 | doi = 10.1101/gr.111211.110 }}
66. ^{{cite web |url=http://compbio.mit.edu/cummeRbund |title=CummeRbund - An R package for persistent storage, analysis, and visualization of RNA-Seq from cufflinks output |website= |accessdate=2013-07-28}}
67. ^{{cite journal | vauthors = Garcion E, Wallace B, Pelletier L, Wion D | title = RNA mutagenesis and sporadic prion diseases | journal = Journal of Theoretical Biology | volume = 230 | issue = 2 | pages = 271–4 | date = September 2004 | pmid = 15302558 | doi = 10.1016/j.jtbi.2004.05.014 }}
68. ^{{cite journal | vauthors = Teixeira MR | title = Recurrent fusion oncogenes in carcinomas | journal = Critical Reviews in Oncogenesis | volume = 12 | issue = 3–4 | pages = 257–71 | date = December 2006 | pmid = 17425505 | doi = 10.1615/critrevoncog.v12.i3-4.40 }}
69. ^¹{{cite journal | vauthors = Maher CA, Kumar-Sinha C, Cao X, Kalyana-Sundaram S, Han B, Jing X, Sam L, Barrette T, Palanisamy N, Chinnaiyan AM | title = Transcriptome sequencing to detect gene fusions in cancer | journal = Nature | volume = 458 | issue = 7234 | pages = 97–101 | date = March 2009 | pmid = 19136943 | pmc = 2725402 | doi = 10.1038/nature07638 | bibcode = 2009Natur.458...97M }}
70. ^{{cite journal | vauthors = Tuch BB, Laborde RR, Xu X, Gu J, Chung CB, Monighetti CK, Stanley SJ, Olsen KD, Kasperbauer JL, Moore EJ, Broomer AJ, Tan R, Brzoska PM, Muller MW, Siddiqui AS, Asmann YW, Sun Y, Kuersten S, Barker MA, De La Vega FM, Smith DI | display-authors = 6 | title = Tumor transcriptome sequencing reveals allelic expression imbalances associated with copy number alterations | journal = PLOS One | volume = 5 | issue = 2 | pages = e9317 | date = February 2010 | pmid = 20174472 | pmc = 2824832 | doi = 10.1371/journal.pone.0009317 | bibcode = 2010PLoSO...5.9317T }}
71. ^{{cite journal | vauthors = Berger MF, Levin JZ, Vijayendran K, Sivachenko A, Adiconis X, Maguire J, Johnson LA, Robinson J, Verhaak RG, Sougnez C, Onofrio RC, Ziaugra L, Cibulskis K, Laine E, Barretina J, Winckler W, Fisher DE, Getz G, Meyerson M, Jaffe DB, Gabriel SB, Lander ES, Dummer R, Gnirke A, Nusbaum C, Garraway LA | title = Integrative analysis of the melanoma transcriptome | journal = Genome Research | volume = 20 | issue = 4 | pages = 413–27 | date = April 2010 | pmid = 20179022 | pmc = 2847744 | doi = 10.1101/gr.103697.109 }}
72. ^{{cite journal | vauthors = Twine NA, Janitz K, Wilkins MR, Janitz M | title = Whole transcriptome sequencing reveals gene expression and splicing differences in brain regions affected by Alzheimer's disease | journal = PLOS One | volume = 6 | issue = 1 | pages = e16266 | date = January 2011 | pmid = 21283692 | pmc = 3025006 | doi = 10.1371/journal.pone.0016266 | bibcode = 2011PLoSO...616266T }}
73. ^{{cite journal | vauthors = Ku GM, Kim H, Vaughn IW, Hangauer MJ, Myung Oh C, German MS, McManus MT | title = Research resource: RNA-Seq reveals unique features of the pancreatic β-cell transcriptome | journal = Molecular Endocrinology | volume = 26 | issue = 10 | pages = 1783–92 | date = October 2012 | pmid = 22915829 | pmc = 3458219 | doi = 10.1210/me.2012-1176 }}
74. ^{{cite journal | vauthors = Morán I, Akerman I, van de Bunt M, Xie R, Benazra M, Nammo T, Arnes L, Nakić N, García-Hurtado J, Rodríguez-Seguí S, Pasquali L, Sauty-Colace C, Beucher A, Scharfmann R, van Arensbergen J, Johnson PR, Berry A, Lee C, Harkins T, Gmyr V, Pattou F, Kerr-Conte J, Piemonti L, Berney T, Hanley N, Gloyn AL, Sussel L, Langman L, Brayman KL, Sander M, McCarthy MI, Ravassard P, Ferrer J | title = Human β cell transcriptome analysis uncovers lncRNAs that are tissue-specific, dynamically regulated, and abnormally expressed in type 2 diabetes | journal = Cell Metabolism | volume = 16 | issue = 4 | pages = 435–48 | date = October 2012 | pmid = 23040067 | pmc = 3475176 | doi = 10.1016/j.cmet.2012.08.010 }}
75. ^{{cite journal | vauthors = Merrick BA, Phadke DP, Auerbach SS, Mav D, Stiegelmeyer SM, Shah RR, Tice RR | title = RNA-Seq profiling reveals novel hepatic gene expression pattern in aflatoxin B1 treated rats | journal = PLOS One | volume = 8 | issue = 4 | pages = e61768 | year = 2013 | pmid = 23630614 | pmc = 3632591 | doi = 10.1371/journal.pone.0061768 | bibcode = 2013PLoSO...861768M }}
76. ^{{cite journal | vauthors = Merrick BA, Chang JS, Phadke DP, Bostrom MA, Shah RR, Wang X, Gordon O, Wright GM | title = HAfTs are novel lncRNA transcripts from aflatoxin exposure | journal = PLOS One | volume = 13 | issue = 1 | pages = e0190992 | date = 2018 | pmid = 29351317 | pmc = 5774710 | doi = 10.1371/journal.pone.0190992 | bibcode = 2018PLoSO..1390992M }}
77. ^{{cite journal | vauthors = Han Y, Chen J, Zhao X, Liang C, Wang Y, Sun L, Jiang Z, Zhang Z, Yang R, Chen J, Li Z, Tang A, Li X, Ye J, Guan Z, Gui Y, Cai Z | title = MicroRNA expression signatures of bladder cancer revealed by deep sequencing | journal = PLOS One | volume = 6 | issue = 3 | pages = e18286 | date = March 2011 | pmid = 21464941 | pmc = 3065473 | doi = 10.1371/journal.pone.0018286 | bibcode = 2011PLoSO...618286H }}
78. ^{{cite web |url= http://genome.ucsc.edu/ENCODE/dataMatrix/encodeDataMatrixHuman.html |title=ENCODE Data Matrix |website= |accessdate=2013-07-28}}
79. ^{{cite web |url=https://tcga-data.nci.nih.gov/tcga/tcgaHome2.jsp |title=The Cancer Genome Atlas - Data Portal |website= |accessdate=2013-07-28}}