“Massive parallel sequencing”的意思、由来-开放百科全书

Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation sequencing. Some of these technologies emerged in 1994-1998 ^[1]^[2]^[1]^[2]^[3] and have been commercially available since 2005. These technologies use miniaturized and parallelized platforms for sequencing of 1 million to 43 billion short reads (50-400 bases each) per instrument run.

Many NGS platforms differ in engineering configurations and sequencing chemistry. They share the technical paradigm of massive parallel sequencing via spatially separated, clonally amplified DNA templates or single DNA molecules in a flow cell. This design is very different from that of Sanger sequencing—also known as capillary sequencing or first-generation sequencing—that is based on electrophoretic separation of chain-termination products produced in individual sequencing reactions.^[4]

NGS Platforms

DNA sequencing with commercially available NGS platforms is generally conducted with the following steps. First, DNA sequencing libraries are generated by clonal amplification by PCR in vitro. Second, the DNA is sequenced by synthesis, such that the DNA sequence is determined by the addition of nucleotides to the complementary strand rather than through chain-termination chemistry. Third, the spatially segregated, amplified DNA templates are sequenced simultaneously in a massively parallel fashion without the requirement for a physical separation step. While these steps are followed in most NGS platforms, each utilizes a different strategy.^[5]

NGS parallelization of the sequencing reactions generates hundreds of megabases to gigabases of nucleotide sequence reads in a single instrument run. This has enabled a drastic increase in available sequence data and fundamentally changed genome sequencing approaches in the biomedical sciences.^[6]

Newly emerging NGS technologies and instruments have further contributed to a significant decrease in the cost of sequencing nearing the mark of $1000 per genome sequencing.^[7]^[8]

As of 2014, massively parallel sequencing platforms commercially available and their features are summarized in the table. As the pace of NGS technologies is advancing rapidly, technical specifications and pricing are in flux.

Run times and gigabase (Gb) output per run for single-end sequencing are noted. Run times and outputs approximately double when performing paired-end sequencing.

‡Average read lengths for the Roche 454 and Helicos Biosciences platforms.^[18]

Template preparation methods for NGS

Two methods are used in preparing templates for NGS reactions: amplified templates originating from single DNA molecules, and single DNA molecule templates.

For imaging systems which cannot detect single fluorescence events, amplification of DNA templates is required. The three most common amplification methods are emulsion PCR (emPCR), rolling circle and solid-phase amplification. The final distribution of templates can be spatially random or on a grid.

Emulsion PCR

In emulsion PCR methods, a DNA library is first generated through random fragmentation of genomic DNA. Single-stranded DNA fragments (templates) are attached to the surface of beads with adaptors or linkers, and one bead is attached to a single DNA fragment from the DNA library. The surface of the beads contains oligonucleotide probes with sequences that are complementary to the adaptors binding the DNA fragments. The beads are then compartmentalized into water-oil emulsion droplets. In the aqueous water-oil emulsion, each of the droplets capturing one bead is a PCR microreactor that produces amplified copies of the single DNA template.^[19]^[20]^[21]

Gridded Rolling Circle Nanoballs

Amplification of a population of single DNA molecules by rolling circle amplification in solution is followed by capture on a grid of spots sized to be smaller than the DNAs to be immobilized.^[22]^[23]^[24]^[25]

DNA colony generation (Bridge amplification)

Forward and reverse primers are covalently attached at high-density to the slide in a flow cell. The ratio of the primers to the template on the support defines the surface density of the amplified clusters. The flow cell is exposed to reagents for polymerase-based extension, and priming occurs as the free/distal end of a ligated fragment "bridges" to a complementary oligo on the surface. Repeated denaturation and extension results in localized amplification of DNA fragments in millions of separate locations across the flow cell surface. Solid-phase amplification produces 100–200 million spatially separated template clusters, providing free ends to which a universal sequencing primer is then hybridized to initiate the sequencing reaction.^[19]^[20] This technology was filed for a patent in 1997 from Glaxo-Welcome's Geneva Biomedical Research Institute (GBRI), by Pascal Mayer, Eric Kawashima, and Laurent Farinelli,^[1]^[2] and was publicly presented for the first time in 1998.^[3] In 1994 Adams and Kron filed a patent on a similar, but non-clonal, surface amplification method, named “bridge amplification”^[26] adapted for clonal amplification in 1997 by Church and Mitra.^[22]^[23]

Single-molecule templates

Protocols requiring DNA amplification are often cumbersome to implement and may introduce sequencing errors. The preparation of single-molecule templates is more straightforward and does not require PCR, which can introduce errors in the amplified templates. AT-rich and GC-rich target sequences often show amplification bias, which results in their underrepresentation in genome alignments and assemblies.

Single molecule templates are usually immobilized on solid supports using one of at least three different approaches. In the first approach, spatially distributed individual primer molecules are covalently attached to the solid support. The template, which is prepared by randomly fragmenting the starting material into small sizes (for example,~200–250 bp) and adding common adapters to the fragment ends, is then hybridized to the immobilized primer. In the second approach, spatially distributed single-molecule templates are covalently attached to the solid support by priming and extending single-stranded, single-molecule templates from immobilized primers. A common primer is then hybridized to the template.

In either approach, DNA polymerase can bind to the immobilized primed template configuration to initiate the NGS reaction. Both of the above approaches are used by Helicos BioSciences. In a third approach, spatially distributed single polymerase molecules

are attached to the solid support, to which a primed template molecule is bound. This approach is used by Pacific Biosciences. Larger DNA molecules (up to tens of thousands of base pairs) can be used with this technique and, unlike the first two approaches, the third approach can be used with real-time methods, resulting in potentially longer read lengths.

Sequencing Approaches for NGS

Pyrosequencing

In 1996, Pål Nyrén and his student Mostafa Ronaghi at the Royal Institute of Technology in Stockholm published their method of pyrosequencing.^[27] Pyrosequencing is a non-electrophoretic, bioluminescence method that measures the release of inorganic pyrophosphate by proportionally converting it into visible light using a series of enzymatic reactions. Unlike other sequencing approaches that use modified nucleotides to terminate DNA synthesis, the pyrosequencing method manipulates DNA polymerase by the single addition of a dNTP in limiting amounts.

Upon incorporation of the complementary dNTP, DNA polymerase extends the primer and pauses. DNA synthesis is reinitiated following the addition of the next complementary dNTP in the dispensing cycle.

The order and intensity of the light peaks are recorded as flowgrams, which reveal the underlying DNA sequence.

Sequencing by reversible terminator chemistry

This approach uses reversible terminator-bound dNTPs in a cyclic method that comprises nucleotide incorporation, fluorescence imaging and cleavage.

A fluorescently-labeled terminator is imaged as each dNTP is added and then cleaved to allow incorporation of the next base.

These nucleotides are chemically blocked such that each incorporation is a unique event. An imaging step follows each base incorporation step, then the blocked group is chemically removed to prepare each strand for the next incorporation by DNA polymerase. This series of steps continues for a specific number of cycles, as determined by user-defined instrument settings. The 3' blocking groups were originally conceived as either enzymatic^[29] or chemical reversal^[30]^[31] The chemical method has been the basis for the Solexa and Illumina machines.

Sequencing by reversible terminator chemistry can be a four-colour cycle such as used by Illumina/Solexa, or a one-colour cycle such as used by Helicos BioSciences.

Helicos BioSciences used “virtual Terminators”, which are unblocked terminators with a second nucleoside analogue that acts as an inhibitor. These terminators have the appropriate modifications for terminating or inhibiting groups so that DNA synthesis is terminated after a single base addition.^[20]^[32]^[33]

Sequencing-by-ligation mediated by ligase enzymes

In this approach, the sequence extension reaction is not carried out by polymerases but rather by DNA ligase and either one-base-encoded probes or two-base-encoded probes. In its simplest form, a fluorescently labelled probe hybridizes to its complementary sequence adjacent to the primed template. DNA ligase is then added to join the dye-labelled probe to the primer. Non-ligated probes are washed away, followed by fluorescence imaging to determine the identity of the ligated probe.

The cycle can be repeated either by using cleavable probes to remove the fluorescent dye and regenerate a 5′-PO4 group for subsequent ligation cycles (chained ligation^[11]^[34]) or by removing and hybridizing a new primer to the template (unchained ligation^[13]^[14]).

Phospholinked Fluorescent Nucleotides or Real-time sequencing

The method of real-time sequencing involves imaging the continuous incorporation of dye-labelled nucleotides during DNA synthesis: single DNA polymerase molecules

are attached to the bottom surface of individual zero-mode waveguide detectors (Zmw detectors) that can obtain sequence information while phospholinked nucleotides are being incorporated into the growing primer strand.

Pacific Biosciences uses a unique DNA polymerase which better incorporates phospholinked nucleotides and enables the resequencing of closed circular templates.

While single-read accuracy is 87%, consensus accuracy has been demonstrated at 99.999% with multi-kilobase read lengths.^[35]^[36] In 2015, Pacific Biosciences released a new sequencing instrument called the Sequel System, which increases capacity approximately 6.5-fold.^[37]^[38]

References

1. ^¹{{ cite patent | number = WO1998044151A1 | status = application | title = Method of nucleic acid amplification| pubdate = 1998-10-08 | fdate = 1998-04-01 | pridate = 1997-04-01 | inventor = Laurent Farinelli, Eric Kawashima, Pascal Mayer}}
2. ^¹{{ cite patent | number = WO1998044152A1 | status = application | title = Method of nucleic acid sequencing | pubdate = 1998-10-08 | fdate = 1998-04-01 | pridate = 1997-04-01 | inventor = Laurent Farinelli, Eric Kawashima, Pascal Mayer}}
3. ^¹{{cite web | title=A very large scale, high throughput and low cost DNA sequencing method based on a new 2-dimensional DNA auto-patterning process |author=P. Mayer et al., presented at the Fifth International Automation in Mapping and DNA Sequencing Conference, St. Louis, MO, USA | date= October 7–10, 1998 |url=http://www.slideshare.net/pascalmayer/dna-colony-massively-parrallel-sequencing-ams98-presentation DNA colony massively parallel sequencing ams98 presentation}}
4. ^{{cite journal | title=Next-Generation Sequencing: From Basic Research to Diagnostics |author1=Karl V. Voelkerding |author2=Shale A. Dames |author3=Jacob D. Durtschi |last-author-amp=yes |journal=Clinical Chemistry |volume=55|issue=4 |pages=641–658 |year=2009 |PMID= 19246620 |doi=10.1373/clinchem.2008.112789}}
5. ^{{cite journal | title=Next Generation DNA Sequencing and the Future of Genomic Medicine, |author1=Matthew W. Anderson |author2=Iris Schrijver | journal=Genes |year=2010 | volume=1 |pages=38–69 |url=http://www.mdpi.com/2073-4425/1/1/38/pdf | issue=1 | doi=10.3390/genes1010038}}
6. ^{{cite journal | PMID=19679224|author1=Tracy Tucker |author2=Marco Marra |author3=Jan M. Friedman |last-author-amp=yes | title=Massively Parallel Sequencing The Next Big Thing in Genetic Medicine | journal=Am J Hum Genet |date=Aug 2009 |volume=85 |issue=2 |pages=142–54 | doi=10.1016/j.ajhg.2009.06.022 | pmc=2725244}}
7. ^{{cite journal | author=Andreas Von Bubnoff |PMID=18329356 |year=2008| title= Next-generation sequencing: the race is on | journal=Cell |volume=132 |pages= 721–723 | doi=10.1016/j.cell.2008.02.028 | issue=5}}
8. ^{{cite web|url=http://www.genome.gov/27527585 |title=2008 Release: NHGRI Seeks DNA Sequencing Technologies Fit for Routine Laboratory and Medical Use |publisher=Genome.gov |date= |accessdate=2012-08-05}}
9. ^http://systems.illumina.com/systems/hiseq_2500_1500/performance_specifications.html
10. ^{{cite web |url=http://genomics.ed.ac.uk/blog/hiseq-v4-here-and-it-delivers |title=Archived copy |accessdate=2014-11-06 |deadurl=yes |archiveurl=https://web.archive.org/web/20141106114253/http://genomics.ed.ac.uk/blog/hiseq-v4-here-and-it-delivers |archivedate=2014-11-06 |df= }}
11. ^¹{{cite journal | journal=Genome Res |date= Sep 2009 |volume=19 |issue=9| pages=1527–41| title=Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding |vauthors=McKernan KJ, etal |PMID= 19546169 | doi=10.1101/gr.091868.109 | pmc=2752135}}
12. ^{{cite web | title= Ion Torrent |url=http://www.allseq.com/knowledgebank/sequencing-platforms/life-technologies-ion-torrent/ | accessdate=1 Jan 2014}}
13. ^¹{{cite journal |vauthors=Drmanac R, etal | year=2009 |title=Human Genome Sequencing Using Unchained Base Reads on Self-assembling DNA Nanoarrays | journal=Science | volume=327 |issue=5961 |pages=78–81 | PMID= 19892942 | doi=10.1126/science.1181498}}
14. ^¹{{cite journal |vauthors=Shendure J, Porreca GJ, Reppas NB, Lin X, McCutcheon JP, Rosenbaum AM, Wang MD, Zhang K, Mitra RD, Church GM |year=2005 |title=Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome |journal=Science |volume= 309 | issue=5741 |pages=1728–32 | PMID= 16081699 | doi=10.1126/science.1117389}}
15. ^{{cite journal |vauthors=Peters BA, etal | year=2012 |title= Accurate whole genome sequencing and haplotyping from 10-20 human cells |journal=Nature |volume=487 | pages=190–195 | PMID= 22785314 | doi=10.1038/nature11236 | issue=7406 | pmc=3397394}}
16. ^Pacific Biosciences Introduces New Chemistry With Longer Read Lengths to Detect Novel Features in DNA Sequence and Advance Genome Studies of Large Organisms
17. ^{{cite web | author=Lex Nederbragt | url=http://flxlexblog.wordpress.com/2013/07/05/de-novo-bacterial-genome-assembly-a-solved-problem/ | title=De novo bacterial genome assembly: a solved problem?}}
18. ^{{cite journal | title=Diagnostic Next Generation Sequencing |author1=Karl V. Voelkerding |author2=Shale Dames |author3=Jacob D. Durtschi |last-author-amp=yes | journal= J Molec Diagn |PMID= 20805560 | date=September 2010 | volume= 12 |issue= 5 | doi=10.2353/jmoldx.2010.100043 | pages=539–51 | pmc=2928417}}
19. ^¹Chee-Seng, Ku; En Yun, Loy; Yudi, Pawitan; and Kee-Seng, Chia. Next Generation Sequencing Technologies and Their Applications. In: Encyclopedia of Life Sciences (ELS). John Wiley & Sons, Ltd: Chichester.April 2010
20. ^¹²{{ cite journal | author=Metzker ML | title= Sequencing technologies - the next generation. |journal=Nat Rev Genet |date=Jan 2010 |volume=11 |issue=1 |pages=31–46 |PMID=19997069 |doi=10.1038/nrg2626}}
21. ^{{cite journal | journal=Proc Natl Acad Sci U S A |date=Jul 22, 2003 |volume=100 |issue=15 |pages=8817–22 |title=Transforming single DNA molecules into fluorescent magnetic particles for detection and enumeration of genetic variations |vauthors=Dressman D, Yan H, Traverso G, Kinzler KW, Vogelstein B |PMID= 12857956 |doi=10.1073/pnas.1133470100 |pmc=166396}}
22. ^¹{{ cite patent | number = US6485944B1 | status = patent | title = Replica amplification of nucleic acid arrays | pubdate = 2002-11-26 | fdate = 1999-03-12 | pridate = 1997-10-10 | inventor = George M. Church, Rob Mitra}}
23. ^¹{{cite journal |vauthors=Mitra R, Church GM |date= Dec 1999| title= In situ localized amplification and contact replication of many individual DNA molecules| journal=Nucleic Acids Res.| volume=27 |issue=24 |pages=e34; 1–6|doi=10.1093/nar/27.24.e34}}
24. ^{{ cite patent | number = WO2007120208A3 | status = application | title = Nanogrid rolling circle dna sequencing | pubdate = 2008-08-28 | fdate = 2006-11-14 | pridate = 2005-11-14 | inventor = George M Church, Gregory J Porreca, Abraham Rosenbaum, Jay Shendure}}
25. ^{{ cite patent | number = US8445194B2 | status = patent| title = Single molecule arrays for genetic and chemical analysis | pubdate = 2013-05-21 | fdate = 2006-06-13 | pridate = 2005-06-15 | inventor=Radoje Drmanac, Matthew J. Callow, Snezana Drmanac, Brian K. Hauser, George Yeung}}
26. ^¹[https://www.google.com/patents/US5641658 US Patent 5,641,658] Method for performing amplification of nucleic acid with two primers bound to a single solid support. Inventors: Christopher P. Adams, Stephen Joseph Kron
27. ^¹{{cite journal| title=Real-time DNA sequencing using detection of pyrophosphate release|author1=M. Ronaghi |author2=S. Karamohamed |author3=B. Pettersson |author4=M. Uhlen |author5=P. Nyren |last-author-amp=yes | journal=Analytical Biochemistry| volume=242| pages=84–9| year=1996| doi=10.1006/abio.1996.0432| pmid=8923969| issue=1}}
28. ^High-throughput DNA sequencing –concepts and limitations, Martin Kircher and Janet Kelso, Bioessays 32: 524–536, 2010 WILEY Periodicals Inc.
29. ^{{cite patent | title=Polynucleotide sequencing | number = WO2001023610A2 | status = application | pubdate = 2001-04-05 | fdate = 2000-09-29 | pridate = 1999-09-29 | inventor = Shankar Balasubramanian}}
30. ^¹{{cite patent | title=Massive parallel method for decoding DNA and RNA | number = US7790869B2 | status = patent | pubdate = 2010-09-07 | fdate = 2007-06-05 | pridate = 2000-10-06 | inventor = Jingyue Ju, Zengmin Li, John Robert Edwards, Yasuhiro Itagaki}}
31. ^¹{{cite journal | title=Accurate whole human genome sequencing using reversible terminator chemistry |vauthors=Bentley DR, etal |journal=Nature |date= Nov 6, 2008 |volume=456 |issue=7218 |pages=53–9 |PMID= 18987734 |doi=10.1038/nature07517 |pmc=2581791}}
32. ^{{cite web|url=http://www.illumina.com/company/assay_technology.ilmn |title=Assay Technology |publisher=Illumina |date= |accessdate=2012-08-05}}
33. ^{{cite web|url=http://www.helicosbio.com/Technology/tabid/64/Default.aspx |title=True Single Molecule Sequencing (tSMS™): Helicos BioSciences |publisher=Helicosbio.com |date= |accessdate=2012-08-05}}
34. ^{{cite web|url=http://appliedbiosystems.cnpg.com/Video/flatFiles/699/index.aspx |title=Fundamentals of 2 Base Encoding and Color Space |publisher=Appliedbiosystems.cnpg.com |date= |accessdate=2012-08-05}}
35. ^{{cite journal | title=Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data |vauthors=Chin CS, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, Clum A, Copeland A, Huddleston J, Eichler EE, Turner SW, Korlach J | journal= Nat Methods |date=Jun 2013 |volume=10 |issue=6 | pages=563–9. | PMID= 23644548 | doi=10.1038/nmeth.2474}}
36. ^{{cite web | title=PacBio Users Report Progress in Long Reads for Plant Genome Assembly, Tricky Regions of Human Genome |date=March 5, 2013 |author=Monica Heger | url=http://www.genomeweb.com/sequencing/pacbio-users-report-progress-long-reads-plant-genome-assembly-tricky-regions-hum}}
37. ^https://www.genomeweb.com/business-news/pacbio-launches-higher-throughput-lower-cost-single-molecule-sequencing-system
38. ^http://www.bio-itworld.com/2015/9/30/pacbio-announces-sequel-sequencing-system.aspx