“Draft:PyClone”的意思、由来-开放百科全书

According to the Clonal Evolution model proposed by Peter Nowell, a mutated cancer cell can accumulate more mutations as it progresses to create sub-clones. These cells divide and mutate further to give rise to other sub-populations. In compliance with the theory of natural selection, some mutations may be advantageous to the cancer cells and thus make the cell immune to previous treatment. Heterogeneity within a single cancer tumour can arise from single nucleotide polymorphism/variation (SNP/SNV) events, microsatellite shifts and instability, loss of heterozygosity (LOH), Copy number variation and karyotypic variations including chromosome structural aberrations and aneuploidy. Due to the current methods of molecular analysis where a mixed population of cancer cells are lysed and sequenced, heterogeneity within the tumour cell population is under-detected. This results in a lack of information on the clonal composition of cancer tumours and more knowledge in this area would aid in the decisions for therapies.

PyClone is a hierarchical Bayes statistical model that uses measurements of allele frequency and allele specific copy numbers to estimate the proportion of tumor cells harboring a mutation. By using deeply sequenced data to find putative clonal clusters, PyClone estimates the cellular prevalence, the portion of cancer cells harbouring a mutation, of the input sample. Progress has been made for measuring variant allele frequency with deep sequencing data but statistical approaches to cluster mutations into biologically relevant groups remain underdeveloped. The commonness of a mutation between cells is difficult to measure because the proportion of cells that harbour a mutation doesn’t simply relate to allelic prevalence. This is due to allelic prevalence depending on multiple factors such as the proportion of 'contaminating' normal cells in the sample, the proportion of tumor cells harboring the mutation, the number of allelic copies of the mutation in each cell, and sources of technical noise. PyClone is among the first methods to incorporate variant allele frequencies (VAFs) with allele-specific copy numbers.^[3] It also accounts for Allelic Imbalances, where alleles of a gene are expressed at different levels in a given cell,^[4] which may occur in the cell due to Segmental CNVs and normal cell contamination.

Workflow

Input

Statistical Modeling

For each mutation, the PyClone model divides the input sample into three sub-populations. The three sub-populations are the normal (non-malignant) population consisting of normal cells, the reference cancer population consisting of cancer cells wild type for the mutation, and the variant cancer cell population consisting of the cancer cells with at least one variant allele of the mutation.

PyClone implements four novel advances in its statistic model that were tested on simulated datasets :

Beta-binomial Emission Densities

Beta-binomial Emission Densities are used by PyClone and are more effective than binomial models used by previous tools. Beta-binomial emission densities more accurately model input datasets that have more variance in allelic prevalence measurements. Higher accuracy in modeling variance in allelic prevalence translates to a higher confidence in the clusterings outputted by PyClone.

Priors

Bayesian Nonparametric Clustering

Instead of fixing the number of clusters prior to clustering, [https://www.stats.ox.ac.uk/~teh/research/npbayes/OrbTeh2010a.pdf Bayesian nonparametric clustering] is used to discover groupings of mutations and the number of groups simultaneously. This allows for cellular prevalence estimates to reflect uncertainty in this parameter.

Section Sequencing

Multiple samples from the same patient can be analyzed at the same time to leverage the scenario in which clonal populations are shared across samples. When multiple samples are sequenced, subclonal populations that are similar in allelic prevalence in some cells but not others can be differentiated from each other.

Output

PyClone outputs posterior densities of cellular prevalences for the mutations in the sample and a matrix containing the probability any two mutations occur in the same cluster. Estimates of clonal populations from differing cellular prevalences of mutations are then generated from the posterior densities.

Applications

PyClone is used to analyze deeply sequenced (over 100× coverage) mutations to identify and quantify clonal populations in tumors. Some applications of PyClone include:

Section sequencing: PyClone is most effective for section sequencing tumor DNA. Section sequencing is when samples are taken from different portions of a single tumour to infer clonal structure from differential cellular prevalence. An advantage of section sequencing is more statistical power and information on the spatial position and interactions of the clones, uncovering information on how tumors evolve in space.

Assumptions

A key assumption of the PyClone model is that all cells within a clonal population have the same genotype. This assumption is likely false since copy number alterations and loss of heterozygosity events are common in cancer cells. The amount of error introduced by this assumption depends on the variability of genotype of cells in the location of interest. For example, in solid tumors the cells of a sample are spatially close together resulting in a small error rate, but for liquid tumors the assumption may introduce more error as cancer cells are mobile.

Another assumption made by PyClone is that the sample follows a perfect and persistent phylogeny. This means that no site mutates more than once in a clonal population and each site has at most one mutant genotype. Mutations that revert back to normal genotype, deletions of segments of DNA harbouring mutations and recurrent mutations are not accounted for in PyClone as it would lead to unidentifiable explanations for some observed data.

Limitations

In order to obtain input data for PyClone, cell lysis is a required step to prepare bulk sample sequencing. This results in the loss of information on the complete set of mutations defining a clonal population. PyClone can distinguish and identify the frequency of different clonal populations but can not identify exact mutations defining these populations.

Instead of clustering cells by mutational composition, PyClone clusters mutations that have similar cellular frequencies. In sub-clones that have similar cellular frequencies, PyClone will mistakenly cluster these subclones together. Chances of making this error decreases when using targeted deep sequencing with high coverage and joint analysis of multiple samples

A confounding factor of the PyClone model arises due to imprecise input information on the genotype of the sample and the depth of sequencing. Uncertainty arises in the posterior densities due to insufficient information on the genotype of mutations and depth of sequencing of the sample. This results in relying on the assumptions made by the PyClone model to interpret and cluster the sample.

Similar tools

SciClone^[10]- SciClone is a Bayesian clustering method on single nucleotide variants (SNVs).

Clomial^[11]- Clomial is a Bayesian clustering method with a decomposition process. Both Clomial and SciCloe limit the SNVs located in copy-number neutral region. The tumor is physically divided into subsections and deep sequenced to measure normal allele and variant allele. Their inference model uses Expectation-Maximization algorithm.

GLClone^[12] – GLClone uses a hierarchical probabilistic model and Bayesian posteriors to calculate copy number alterations in sub-clones.

Cloe ^[13]- Cloe uses a phylogenetic latent feature model for analyzing sequencing data to distinguish the genotypes and the frequency of clones in a tumor.

PhyC ^[14]- PhyC uses an unsupervised learning approach to identify subgroups of patients through clustering the respective cancer evolutionary trees. They identified the patterns of different evolutionary modes in a simulation analysis, and also successfully detected the phenotype-related and cancer type-related subgroups to characterize tree structures within subgroups using actual datasets.

PhyloWGS ^[15]- PhyloWGS reconstructs tumor phylogenies and characterizes the subclonal populations present in a tumor sample using both SSMs and CNVs.

References

1. ^{{Cite journal|last=Fearon|first=Eric R.|last2=Wu|first2=Rong|last3=Greenson|first3=Joel K.|last4=Ulintz|first4=Peter J.|last5=Hardiman|first5=Karin M.|date=2017-02-01|title=Abstract PR07: Complex sub-clonal populations in colorectal cancer lymph node metastasis|url=http://cancerres.aacrjournals.org/content/77/3_Supplement/PR07|journal=Cancer Research|volume=77|issue=3 Supplement|pages=PR07|doi=10.1158/1538-7445.CRC16-PR07|issn=0008-5472}}
2. ^{{Cite journal|last=Shah|first=Sohrab P.|last2=Bouchard-Côté|first2=Alexandre|last3=Aparicio|first3=Samuel|last4=Ha|first4=Gavin|last5=Justina Biele|last6=Laks|first6=Emma|last7=Wan|first7=Adrian|last8=Yap|first8=Damian|last9=Khattra|first9=Jaswinder|date=April 2014|title=PyClone: statistical inference of clonal population structure in cancer|journal=Nature Methods|volume=11|issue=4|pages=396–398|doi=10.1038/nmeth.2883|pmid=24633410|pmc=4864026|issn=1548-7105}}
3. ^{{Cite journal|last=Shah|first=Sohrab P.|last2=Bouchard-Côté|first2=Alexandre|last3=Aparicio|first3=Samuel|last4=Ha|first4=Gavin|last5=Justina Biele|last6=Laks|first6=Emma|last7=Wan|first7=Adrian|last8=Yap|first8=Damian|last9=Khattra|first9=Jaswinder|date=April 2014|title=PyClone: statistical inference of clonal population structure in cancer|journal=Nature Methods|volume=11|issue=4|pages=396–398|doi=10.1038/nmeth.2883|pmid=24633410|pmc=4864026|issn=1548-7105}}
4. ^{{Cite journal|last=Blanchette|first=Mathieu|last2=Pastinen|first2=Tomi|last3=Gunderson|first3=Kevin L.|last4=Pokholok|first4=Dmitry|last5=Ge|first5=Bing|last6=Wagner|first6=James R.|date=2010-07-08|title=Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human|journal=PLOS Computational Biology|volume=6|issue=7|pages=e1000849|doi=10.1371/journal.pcbi.1000849|issn=1553-7358|pmc=2900287|pmid=20628616}}
5. ^{{Cite web|url=https://www.illumina.com/science/technology/next-generation-sequencing/deep-sequencing.html|title=Deep Sequencing|website=www.illumina.com|access-date=2019-02-26}}
6. ^{{Cite journal|last=Hanson|first=K.M.|date=May 1993|title=Bayesian reconstruction based on flexible prior models|url=https://pdfs.semanticscholar.org/3f87/d3508e09f9a3c792b74b847a287bcc9ac5ec.pdf|journal=J. Opt. Soc. Am. A|volume=10|pages=|via=}}
7. ^{{Cite journal|last=Aparicio|first=Samuel|last2=Shah|first2=Sohrab P.|last3=Caldas|first3=Carlos|last4=Marra|first4=Marco A.|last5=Hansen|first5=Carl|last6=Eaves|first6=Connie J.|last7=Huntsman|first7=David|last8=Nguyen|first8=Long|last9=Lorette|first9=Julie|date=February 2015|title=Dynamics of genomic clones in breast cancer patient xenografts at single-cell resolution|journal=Nature|volume=518|issue=7539|pages=422–426|doi=10.1038/nature13952|pmid=25470049|pmc=4864027|issn=1476-4687}}
8. ^{{Cite journal|last=Caldas|first=Carlos|last2=Rosenfeld|first2=Nitzan|last3=Wallis|first3=Matthew|last4=Shah|first4=Sohrab P.|last5=Bentley|first5=David|last6=Humphray|first6=Sean|last7=Kingsbury|first7=Zoya|last8=Shumansky|first8=Karey|last9=Farahani|first9=Hossein|date=2015-11-04|title=Multifocal clonal evolution characterized using circulating tumour DNA in a case of metastatic breast cancer|journal=Nature Communications|volume=6|pages=8760|doi=10.1038/ncomms9760|pmid=26530965|pmc=4659935|issn=2041-1723}}
9. ^{{Cite journal|last=Maciejewski|first=Jaroslaw P.|last2=Ogawa|first2=Seishi|last3=Shih|first3=Lee-Yung|last4=Miyano|first4=Satoru|last5=Chiba|first5=Shigeru|last6=Saunthararajah|first6=Yogen|last7=Miyawaki|first7=Shuichi|last8=Nakamaki|first8=Tsuyoshi|last9=Dienes|first9=Brittney|date=2015-12-03|title=Serial Sequencing in Myelodysplastic Syndromes Reveals Dynamic Changes in Clonal Architecture and Allows for a New Prognostic Assessment of Mutations Detected in Cross-Sectional Testing|url=http://www.bloodjournal.org/content/126/23/709|journal=Blood|language=en|volume=126|issue=23|pages=709|issn=0006-4971}}
10. ^{{Cite journal|last=Ding|first=Li|last2=Wilson|first2=Richard K.|last3=Mardis|first3=Elaine R.|last4=Ley|first4=Timothy J.|last5=DiPersio|first5=John F.|last6=Schierding|first6=William|last7=Ellis|first7=Matthew J.|last8=Walter|first8=Matthew J.|last9=Graubert|first9=Timothy A.|date=2014-08-07|title=SciClone: Inferring Clonal Architecture and Tracking the Spatial and Temporal Patterns of Tumor Evolution|journal=PLOS Computational Biology|language=en|volume=10|issue=8|pages=e1003665|doi=10.1371/journal.pcbi.1003665|issn=1553-7358|pmc=4125065|pmid=25102416}}
11. ^{{Cite journal|last=Noble|first=William Stafford|last2=Blau|first2=C. Anthony|last3=Witten|first3=Daniela|last4=Song|first4=ChaoZhong|last5=Nickerson|first5=Debbie|last6=Smith|first6=Josh|last7=Weber|first7=Kris|last8=Hu|first8=Alex|last9=Wang|first9=Junfeng|date=2014-07-10|title=Inferring Clonal Composition from Multiple Sections of a Breast Cancer|journal=PLOS Computational Biology|language=en|volume=10|issue=7|pages=e1003703|doi=10.1371/journal.pcbi.1003703|issn=1553-7358|pmc=4091710|pmid=25010360}}
12. ^{{Cite book|last=Geng|first=Yu|last2=Zhao|first2=Zhongmeng|last3=Xu|first3=Jing|last4=Liu|first4=Ruoyu|last5=Huang|first5=Yi|last6=Zhang|first6=Xuanping|last7=Xiao|first7=Xiao|last8=Maomao|last9=Wang|first9=Jiayin|date=2017|editor-last=Huang|editor-first=De-Shuang|editor2-last=Jo|editor2-first=Kang-Hyun|editor3-last=Figueroa-García|editor3-first=Juan Carlos|title=Identifying Heterogeneity Patterns of Allelic Imbalance on Germline Variants to Infer Clonal Architecture|journal=Intelligent Computing Theories and Application|series=Lecture Notes in Computer Science|language=en|publisher=Springer International Publishing|pages=286–297|doi=10.1007/978-3-319-63312-1_26|isbn=978-3-319-63312-1}}
13. ^{{Cite journal|last=Markowetz|first=Florian|last2=Rosenfeld|first2=Nitzan|last3=Yuan|first3=Ke|last4=Mouliere|first4=Florent|last5=Marass|first5=Francesco|date=2016-04-06|title=A phylogenetic latent feature model for clonal deconvolution|journal=The Annals of Applied Statistics|volume=10|issue=4|pages=2377–2404|language=en|doi=10.1214/16-AOAS986|arxiv=1604.01715v1}}
14. ^{{Cite journal|last=Shimamura|first=Teppei|last2=Miyano|first2=Satoru|last3=Mimori|first3=Koshi|last4=Uchi|first4=Ryutaro|last5=Niida|first5=Atsushi|last6=Matsui|first6=Yusuke|date=2017-05-01|title=phyC: Clustering cancer evolutionary trees|journal=PLOS Computational Biology|language=en|volume=13|issue=5|pages=e1005509|doi=10.1371/journal.pcbi.1005509|pmid=28459850|pmc=5432190|issn=1553-7358}}
15. ^{{Cite journal|last=Deshwar|first=Amit G.|last2=Vembu|first2=Shankar|last3=Yung|first3=Christina K.|last4=Jang|first4=Gun Ho|last5=Stein|first5=Lincoln|last6=Morris|first6=Quaid|date=2015-02-13|title=PhyloWGS: Reconstructing subclonal composition and evolution from whole-genome sequencing of tumors|journal=Genome Biology|volume=16|issue=1|pages=35|doi=10.1186/s13059-015-0602-8|issn=1465-6906|pmc=4359439|pmid=25786235}}

Background