词条 | Draft:PyClone |
释义 |
BackgroundAccording to the Clonal Evolution model proposed by Peter Nowell, a mutated cancer cell can accumulate more mutations as it progresses to create sub-clones. These cells divide and mutate further to give rise to other sub-populations. In compliance with the theory of natural selection, some mutations may be advantageous to the cancer cells and thus make the cell immune to previous treatment. Heterogeneity within a single cancer tumour can arise from single nucleotide polymorphism/variation (SNP/SNV) events, microsatellite shifts and instability, loss of heterozygosity (LOH), Copy number variation and karyotypic variations including chromosome structural aberrations and aneuploidy. Due to the current methods of molecular analysis where a mixed population of cancer cells are lysed and sequenced, heterogeneity within the tumour cell population is under-detected. This results in a lack of information on the clonal composition of cancer tumours and more knowledge in this area would aid in the decisions for therapies. PyClone is a hierarchical Bayes statistical model that uses measurements of allele frequency and allele specific copy numbers to estimate the proportion of tumor cells harboring a mutation. By using deeply sequenced data to find putative clonal clusters, PyClone estimates the cellular prevalence, the portion of cancer cells harbouring a mutation, of the input sample. Progress has been made for measuring variant allele frequency with deep sequencing data but statistical approaches to cluster mutations into biologically relevant groups remain underdeveloped. The commonness of a mutation between cells is difficult to measure because the proportion of cells that harbour a mutation doesn’t simply relate to allelic prevalence. This is due to allelic prevalence depending on multiple factors such as the proportion of 'contaminating' normal cells in the sample, the proportion of tumor cells harboring the mutation, the number of allelic copies of the mutation in each cell, and sources of technical noise. PyClone is among the first methods to incorporate variant allele frequencies (VAFs) with allele-specific copy numbers.[3] It also accounts for Allelic Imbalances, where alleles of a gene are expressed at different levels in a given cell,[4] which may occur in the cell due to Segmental CNV WorkflowInputPyClone requires 2 inputs:
Statistical ModelingFor each mutation, the PyClone model divides the input sample into three sub-populations. The three sub-populations are the normal (non-malignant) population consisting of normal cells, the reference cancer population consisting of cancer cells wild type for the mutation, and the variant cancer cell population consisting of the cancer cells with at least one variant allele of the mutation. PyClone implements four novel advances in its statistic model that were tested on simulated datasets : Beta-binomial Emission DensitiesBeta-binomial Emission Densities are used by PyClone and are more effective than binomial models used by previous tools. Beta-binomial emission densities more accurately model input datasets that have more variance in allelic prevalence measurements. Higher accuracy in modeling variance in allelic prevalence translates to a higher confidence in the clusterings outputted by PyClone. PriorsPyClone acknowledges that some geometrical structures and properties, such as copy number, of the clonal population to be reconstructed is known. When not enough information is available or taken into account, the reconstruction is usually of low confidence and many solutions are possible. PyClone uses priors, flexible prior probability estimates, of possible mutational genotypes to link allelic prevalence measurements to zygosity and copy number variants and is one of the first methods to incorporate variant allele frequencies (VAFs) with allele-specific copy numbers.[6]Bayesian Nonparametric ClusteringInstead of fixing the number of clusters prior to clustering, [https://www.stats.ox.ac.uk/~teh/research/npbayes/OrbTeh2010a.pdf Bayesian nonparametric clustering] is used to discover groupings of mutations and the number of groups simultaneously. This allows for cellular prevalence estimates to reflect uncertainty in this parameter. Section SequencingMultiple samples from the same patient can be analyzed at the same time to leverage the scenario in which clonal populations are shared across samples. When multiple samples are sequenced, subclonal populations that are similar in allelic prevalence in some cells but not others can be differentiated from each other. OutputPyClone outputs posterior densities of cellular prevalences for the mutations in the sample and a matrix containing the probability any two mutations occur in the same cluster. Estimates of clonal populations from differing cellular prevalences of mutations are then generated from the posterior densities. ApplicationsPyClone is used to analyze deeply sequenced (over 100× coverage) mutations to identify and quantify clonal populations in tumors. Some applications of PyClone include: Xenografting is used as a reasonable model to study human breast cancer but the consequences of engraftment and genomic propagation of xenografts have not been examined at a single-cell resolution. PyClone can be used to follow the clonal dynamics of initial grafts and serial propagation of primary and metastatic human breast cancers in immunodeficient mice. PyClone can predict how clonal dynamics differ after initial engraftment, over serial passage generations.[7]Circulating tumour DNA (plasma DNA) Analysis can be used to track tumour burden and analyse cancer genomes non-invasively but the extent to which it represents metastatic heterogeneity is unknown. PyClone can be used to compare the clonal population structures present in the tumour and plasma samples from amplicon sequencing data. Stem and metastatic-clade mutation clusters can be inferred using PyClone and then compared to results from clonal ordering.[8]Serial Time Point Sequencing: PyClone can be used to study the evolution of mutational clusters as cancer progresses. With samples taken from different time points, PyClone can identify the expansion and decline of initial clones and discover newly acquired subclones that arise during treatment. Understanding clonal dynamics improves understanding on how related cancers such as MDS, MPN and sAML compare in risk and give insight on the clinical significance of somatic mutations.[9]Section sequencing: PyClone is most effective for section sequencing tumor DNA. Section sequencing is when samples are taken from different portions of a single tumour to infer clonal structure from differential cellular prevalence. An advantage of section sequencing is more statistical power and information on the spatial position and interactions of the clones, uncovering information on how tumors evolve in space. AssumptionsA key assumption of the PyClone model is that all cells within a clonal population have the same genotype. This assumption is likely false since copy number alterations and loss of heterozygosity events are common in cancer cells. The amount of error introduced by this assumption depends on the variability of genotype of cells in the location of interest. For example, in solid tumors the cells of a sample are spatially close together resulting in a small error rate, but for liquid tumors the assumption may introduce more error as cancer cells are mobile. Another assumption made by PyClone is that the sample follows a perfect and persistent phylogeny. This means that no site mutates more than once in a clonal population and each site has at most one mutant genotype. Mutations that revert back to normal genotype, deletions of segments of DNA harbouring mutations and recurrent mutations are not accounted for in PyClone as it would lead to unidentifiable explanations for some observed data. LimitationsIn order to obtain input data for PyClone, cell lysis is a required step to prepare bulk sample sequencing. This results in the loss of information on the complete set of mutations defining a clonal population. PyClone can distinguish and identify the frequency of different clonal populations but can not identify exact mutations defining these populations. Instead of clustering cells by mutational composition, PyClone clusters mutations that have similar cellular frequencies. In sub-clones that have similar cellular frequencies, PyClone will mistakenly cluster these subclones together. Chances of making this error decreases when using targeted deep sequencing with high coverage and joint analysis of multiple samples A confounding factor of the PyClone model arises due to imprecise input information on the genotype of the sample and the depth of sequencing. Uncertainty arises in the posterior densities due to insufficient information on the genotype of mutations and depth of sequencing of the sample. This results in relying on the assumptions made by the PyClone model to interpret and cluster the sample. Similar toolsSciClone[10]- SciClone is a Bayesian clustering method on single nucleotide variants (SNVs). Clomial[11]- Clomial is a Bayesian clustering method with a decomposition process. Both Clomial and SciCloe limit the SNVs located in copy-number neutral region. The tumor is physically divided into subsections and deep sequenced to measure normal allele and variant allele. Their inference model uses Expectation-Maximization algorithm. GLClone[12] – GLClone uses a hierarchical probabilistic model and Bayesian posteriors to calculate copy number alterations in sub-clones. Cloe [13]- Cloe uses a phylogenetic latent feature model for analyzing sequencing data to distinguish the genotypes and the frequency of clones in a tumor. PhyC [14]- PhyC uses an unsupervised learning approach to identify subgroups of patients through clustering the respective cancer evolutionary trees. They identified the patterns of different evolutionary modes in a simulation analysis, and also successfully detected the phenotype-related and cancer type-related subgroups to characterize tree structures within subgroups using actual datasets. PhyloWGS [15]- PhyloWGS reconstructs tumor phylogenies and characterizes the subclonal populations present in a tumor sample using both SSMs and CNVs. References1. ^{{Cite journal|last=Fearon|first=Eric R.|last2=Wu|first2=Rong|last3=Greenson|first3=Joel K.|last4=Ulintz|first4=Peter J.|last5=Hardiman|first5=Karin M.|date=2017-02-01|title=Abstract PR07: Complex sub-clonal populations in colorectal cancer lymph node metastasis|url=http://cancerres.aacrjournals.org/content/77/3_Supplement/PR07|journal=Cancer Research|volume=77|issue=3 Supplement|pages=PR07|doi=10.1158/1538-7445.CRC16-PR07|issn=0008-5472}} {{AFC submission|||ts=20190306220924|u=Dbafna|ns=118}}2. ^{{Cite journal|last=Shah|first=Sohrab P.|last2=Bouchard-Côté|first2=Alexandre|last3=Aparicio|first3=Samuel|last4=Ha|first4=Gavin|last5=Justina Biele|last6=Laks|first6=Emma|last7=Wan|first7=Adrian|last8=Yap|first8=Damian|last9=Khattra|first9=Jaswinder|date=April 2014|title=PyClone: statistical inference of clonal population structure in cancer|journal=Nature Methods|volume=11|issue=4|pages=396–398|doi=10.1038/nmeth.2883|pmid=24633410|pmc=4864026|issn=1548-7105}} 3. ^{{Cite journal|last=Shah|first=Sohrab P.|last2=Bouchard-Côté|first2=Alexandre|last3=Aparicio|first3=Samuel|last4=Ha|first4=Gavin|last5=Justina Biele|last6=Laks|first6=Emma|last7=Wan|first7=Adrian|last8=Yap|first8=Damian|last9=Khattra|first9=Jaswinder|date=April 2014|title=PyClone: statistical inference of clonal population structure in cancer|journal=Nature Methods|volume=11|issue=4|pages=396–398|doi=10.1038/nmeth.2883|pmid=24633410|pmc=4864026|issn=1548-7105}} 4. ^{{Cite journal|last=Blanchette|first=Mathieu|last2=Pastinen|first2=Tomi|last3=Gunderson|first3=Kevin L.|last4=Pokholok|first4=Dmitry|last5=Ge|first5=Bing|last6=Wagner|first6=James R.|date=2010-07-08|title=Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human|journal=PLOS Computational Biology|volume=6|issue=7|pages=e1000849|doi=10.1371/journal.pcbi.1000849|issn=1553-7358|pmc=2900287|pmid=20628616}} 5. ^{{Cite web|url=https://www.illumina.com/science/technology/next-generation-sequencing/deep-sequencing.html|title=Deep Sequencing|website=www.illumina.com|access-date=2019-02-26}} 6. ^{{Cite journal|last=Hanson|first=K.M.|date=May 1993|title=Bayesian reconstruction based on flexible prior models|url=https://pdfs.semanticscholar.org/3f87/d3508e09f9a3c792b74b847a287bcc9ac5ec.pdf|journal=J. Opt. Soc. Am. A|volume=10|pages=|via=}} 7. ^{{Cite journal|last=Aparicio|first=Samuel|last2=Shah|first2=Sohrab P.|last3=Caldas|first3=Carlos|last4=Marra|first4=Marco A.|last5=Hansen|first5=Carl|last6=Eaves|first6=Connie J.|last7=Huntsman|first7=David|last8=Nguyen|first8=Long|last9=Lorette|first9=Julie|date=February 2015|title=Dynamics of genomic clones in breast cancer patient xenografts at single-cell resolution|journal=Nature|volume=518|issue=7539|pages=422–426|doi=10.1038/nature13952|pmid=25470049|pmc=4864027|issn=1476-4687}} 8. ^{{Cite journal|last=Caldas|first=Carlos|last2=Rosenfeld|first2=Nitzan|last3=Wallis|first3=Matthew|last4=Shah|first4=Sohrab P.|last5=Bentley|first5=David|last6=Humphray|first6=Sean|last7=Kingsbury|first7=Zoya|last8=Shumansky|first8=Karey|last9=Farahani|first9=Hossein|date=2015-11-04|title=Multifocal clonal evolution characterized using circulating tumour DNA in a case of metastatic breast cancer|journal=Nature Communications|volume=6|pages=8760|doi=10.1038/ncomms9760|pmid=26530965|pmc=4659935|issn=2041-1723}} 9. ^{{Cite journal|last=Maciejewski|first=Jaroslaw P.|last2=Ogawa|first2=Seishi|last3=Shih|first3=Lee-Yung|last4=Miyano|first4=Satoru|last5=Chiba|first5=Shigeru|last6=Saunthararajah|first6=Yogen|last7=Miyawaki|first7=Shuichi|last8=Nakamaki|first8=Tsuyoshi|last9=Dienes|first9=Brittney|date=2015-12-03|title=Serial Sequencing in Myelodysplastic Syndromes Reveals Dynamic Changes in Clonal Architecture and Allows for a New Prognostic Assessment of Mutations Detected in Cross-Sectional Testing|url=http://www.bloodjournal.org/content/126/23/709|journal=Blood|language=en|volume=126|issue=23|pages=709|issn=0006-4971}} 10. ^{{Cite journal|last=Ding|first=Li|last2=Wilson|first2=Richard K.|last3=Mardis|first3=Elaine R.|last4=Ley|first4=Timothy J.|last5=DiPersio|first5=John F.|last6=Schierding|first6=William|last7=Ellis|first7=Matthew J.|last8=Walter|first8=Matthew J.|last9=Graubert|first9=Timothy A.|date=2014-08-07|title=SciClone: Inferring Clonal Architecture and Tracking the Spatial and Temporal Patterns of Tumor Evolution|journal=PLOS Computational Biology|language=en|volume=10|issue=8|pages=e1003665|doi=10.1371/journal.pcbi.1003665|issn=1553-7358|pmc=4125065|pmid=25102416}} 11. ^{{Cite journal|last=Noble|first=William Stafford|last2=Blau|first2=C. Anthony|last3=Witten|first3=Daniela|last4=Song|first4=ChaoZhong|last5=Nickerson|first5=Debbie|last6=Smith|first6=Josh|last7=Weber|first7=Kris|last8=Hu|first8=Alex|last9=Wang|first9=Junfeng|date=2014-07-10|title=Inferring Clonal Composition from Multiple Sections of a Breast Cancer|journal=PLOS Computational Biology|language=en|volume=10|issue=7|pages=e1003703|doi=10.1371/journal.pcbi.1003703|issn=1553-7358|pmc=4091710|pmid=25010360}} 12. ^{{Cite book|last=Geng|first=Yu|last2=Zhao|first2=Zhongmeng|last3=Xu|first3=Jing|last4=Liu|first4=Ruoyu|last5=Huang|first5=Yi|last6=Zhang|first6=Xuanping|last7=Xiao|first7=Xiao|last8=Maomao|last9=Wang|first9=Jiayin|date=2017|editor-last=Huang|editor-first=De-Shuang|editor2-last=Jo|editor2-first=Kang-Hyun|editor3-last=Figueroa-García|editor3-first=Juan Carlos|title=Identifying Heterogeneity Patterns of Allelic Imbalance on Germline Variants to Infer Clonal Architecture|journal=Intelligent Computing Theories and Application|series=Lecture Notes in Computer Science|language=en|publisher=Springer International Publishing|pages=286–297|doi=10.1007/978-3-319-63312-1_26|isbn=978-3-319-63312-1}} 13. ^{{Cite journal|last=Markowetz|first=Florian|last2=Rosenfeld|first2=Nitzan|last3=Yuan|first3=Ke|last4=Mouliere|first4=Florent|last5=Marass|first5=Francesco|date=2016-04-06|title=A phylogenetic latent feature model for clonal deconvolution|journal=The Annals of Applied Statistics|volume=10|issue=4|pages=2377–2404|language=en|doi=10.1214/16-AOAS986|arxiv=1604.01715v1}} 14. ^{{Cite journal|last=Shimamura|first=Teppei|last2=Miyano|first2=Satoru|last3=Mimori|first3=Koshi|last4=Uchi|first4=Ryutaro|last5=Niida|first5=Atsushi|last6=Matsui|first6=Yusuke|date=2017-05-01|title=phyC: Clustering cancer evolutionary trees|journal=PLOS Computational Biology|language=en|volume=13|issue=5|pages=e1005509|doi=10.1371/journal.pcbi.1005509|pmid=28459850|pmc=5432190|issn=1553-7358}} 15. ^{{Cite journal|last=Deshwar|first=Amit G.|last2=Vembu|first2=Shankar|last3=Yung|first3=Christina K.|last4=Jang|first4=Gun Ho|last5=Stein|first5=Lincoln|last6=Morris|first6=Quaid|date=2015-02-13|title=PhyloWGS: Reconstructing subclonal composition and evolution from whole-genome sequencing of tumors|journal=Genome Biology|volume=16|issue=1|pages=35|doi=10.1186/s13059-015-0602-8|issn=1465-6906|pmc=4359439|pmid=25786235}} |
随便看 |
|
开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。