Clonal Architecture of Secondary Acute Myeloid Leukemia Defined by Single-Cell Sequencing

of 12
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Clonal Architecture of Secondary Acute Myeloid Leukemia Defined by Single-Cell Sequencing Andrew E. O. Hughes 1, Vincent Magrini 2,3, Ryan Demeter 2, Christopher A. Miller 2, Robert Fulton 2, Lucinda L.
Clonal Architecture of Secondary Acute Myeloid Leukemia Defined by Single-Cell Sequencing Andrew E. O. Hughes 1, Vincent Magrini 2,3, Ryan Demeter 2, Christopher A. Miller 2, Robert Fulton 2, Lucinda L. Fulton 2, William C. Eades 4, Kevin Elliott 4, Sharon Heath 4, Peter Westervelt 4,5, Li Ding 2,4, Donald F. Conrad 3,6, Brian S. White 2,4, Jin Shao 4,5, Daniel C. Link 4,5, John F. DiPersio 4,5, Elaine R. Mardis 2,3, Richard K. Wilson 2,3,5, Timothy J. Ley 2,3,4,5, Matthew J. Walter 3,4,5, Timothy A. Graubert 4,5,6 * 1 Center for Genome Sciences and Systems Biology, Washington University, St. Louis, Missouri, United States of America, 2 The Genome Institute, Washington University, St. Louis, Missouri, United States of America, 3 Department of Genetics, Washington University, St. Louis, Missouri, United States of America, 4 Department of Internal Medicine, Division of Oncology, Washington University, St. Louis, Missouri, United States of America, 5 Siteman Cancer Center, Washington University, St. Louis, Missouri, United States of America, 6 Department of Pathology and Immunology, Washington University, St. Louis, Missouri, United States of America Abstract Next-generation sequencing has been used to infer the clonality of heterogeneous tumor samples. These analyses yield specific predictions the population frequency of individual clones, their genetic composition, and their evolutionary relationships which we set out to test by sequencing individual cells from three subjects diagnosed with secondary acute myeloid leukemia, each of whom had been previously characterized by whole genome sequencing of unfractionated tumor samples. Single-cell mutation profiling strongly supported the clonal architecture implied by the analysis of bulk material. In addition, it resolved the clonal assignment of single nucleotide variants that had been initially ambiguous and identified areas of previously unappreciated complexity. Accordingly, we find that many of the key assumptions underlying the analysis of tumor clonality by deep sequencing of unfractionated material are valid. Furthermore, we illustrate a single-cell sequencing strategy for interrogating the clonal relationships among known variants that is cost-effective, scalable, and adaptable to the analysis of both hematopoietic and solid tumors, or any heterogeneous population of cells. Citation: Hughes AEO, Magrini V, Demeter R, Miller CA, Fulton R, et al. (2014) Clonal Architecture of Secondary Acute Myeloid Leukemia Defined by Single-Cell Sequencing. PLoS Genet 10(7): e doi: /journal.pgen Editor: Marshall S. Horwitz, University of Washington, United States of America Received November 13, 2013; Accepted May 13, 2014; Published July 10, 2014 Copyright: ß 2014 Hughes et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by the following sources: NIH (P01CA101937, RC2HL102927, U01HG006517, U54HG003079, P30CA91842), Barnes-Jewish Hospital Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * Current address: Massachusetts General Hospital, Boston, Massachusetts, United States of America Introduction Intratumoral heterogeneity is an emerging hallmark of cancer that can be interrogated genome-wide with next-generation sequencing. Critically, sub-populations of tumor cells are organized into hierarchies through clonal evolution. A powerful strategy for studying this population structure is multi-sampling independently assaying genetic variation at distinct points in time or space and comparing mutation profiles. In particular, whole genome sequencing (WGS) of de novo acute myeloid leukemia (AML) has demonstrated genetic evolution between diagnosis and relapse [1,2], and similar results have been obtained from WGS of paired primary-metastasis samples in breast cancer [3]. Furthermore, whole exome sequencing (WES) of multiple regions within primary tumors has revealed extensive regional heterogeneity in pancreatic [4], hepatocellular [5], and renal [6] carcinomas. Thus, clonal heterogeneity within tumors compounds the biological complexity of human cancers, and a detailed understanding of this is important for clinical genomics. The ultimate resolution of multi-sampling is single-cell analysis, which is rapidly becoming tractable. For example, Anderson et al. have used fluorescence in situ hybridization (FISH) to genotype up to five so-called driver lesions in individual pediatric acute lymphoblastic leukemia (ALL) cells, which demonstrated a range of clonal architectures (from linear to complex) in different subjects [7]. Jan et al., Potter et al., and Klco et al. have reported similar findings using either single-cell allele-specific PCR or amplicon sequencing to assay five to ten clonal markers in de novo AML or pediatric ALL [8 10]. In broader (genome-wide) analyses, Navin et al. and Voet et al. have leveraged WGS to call copy number variants (CNVs) in single cells, which they used to reconstruct the phylogenetic history of breast cancer cell lines and primary tumors [11,12]. In addition to multi-sampling strategies, we and others have reported clonal inference from deep sequencing of individual tumor samples [1,13 15]. Briefly, this approach uses the fraction of sequencing reads calling a specific somatic mutation (i.e., the variant allele fraction, or VAF) to estimate the frequency of that variant in the original sample. Often, large numbers of single nucleotide variants (SNVs) cluster at a common VAF, suggesting the presence of a clonal population at a defined frequency. Analyzing tumors in this way yields specific predictions about the PLOS Genetics 1 July 2014 Volume 10 Issue 7 e Author Summary Human cancers are genetically diverse populations of cells that evolve over the course of their natural history or in response to the selective pressure of therapy. In theory, it is possible to infer how this variation is structured into related populations of cells based on the frequency of individual mutations in bulk samples, but the accuracy of these models has not been evaluated across a large number of variants in individual cells. Here, we report a strategy for analyzing hundreds of variants within a single cell, and we apply this method to assess models of tumor clonality derived from bulk samples in three cases of leukemia. The data largely support the predicted population structure, though they suggest specific refinements. This type of approach not only illustrates the biological complexity of human cancer, but it also has the potential to inform patient management. That is, precise knowledge of which variants are present in which populations of cells may allow physicians to more effectively target combinations of mutations and predict how patients will respond to therapy. clonal relationships among variants detected in unfractionated samples: 1) the genetic composition of individual clones (groups of SNVs that arose together), 2) the frequency of each clone (proportional to the mean VAF of the corresponding cluster), and 3) a model for how the clonal architecture evolved (clones at lower frequencies descending from those at higher frequencies). We set out to test these predictions by sequencing single cells from three subjects with an initial diagnosis of myelodysplastic syndrome (MDS), each of whom progressed to secondary AML (saml). We had previously characterized these subjects by WGS of both MDS and saml bone marrow as well as matched skin samples, resulting in a call set of several thousand validated somatic mutations in addition to specific models for the clonal architecture of each tumor [14]. In the current study, we used targeted sequencing to genotype.1,900 of these positions in a dozen single cells from each subject. We used SNP array data to quantify the accuracy of single-cell variant calling, and as reported by others we observed frequent genotyping errors due to stochastic biases in whole genome amplification (allelic dropout, or ADO) [11,12,16]. Nevertheless, while ADO inflated our false negative rate, we maintained a relatively low false positive rate. It was therefore possible to evaluate the major clonal relationships among targeted variants using single-cell sequencing. Ultimately, the single-cell data strongly supported the major clonal populations predicted from the analysis of bulk tissue, in addition to resolving the clonality of SNVs that were originally ambiguous and suggesting previously unappreciated complexity among rare subclones. Accordingly, our findings validate many of the critical assumptions underlying the inference of tumor clonality from unfractionated samples, in addition to demonstrating a highthroughput approach to single-cell genotyping that provides insight into the clonal architecture of heterogeneous samples. Results Targeted Sequencing of Single-Cell, Two-Cell, and Unfractionated Samples We prepared a total of 56 sequencing libraries from whole genome amplified (WGA) single-cell and two-cell saml samples in addition to non-wga unfractionated MDS, saml, and normal (skin) samples (Table S1). We used hybridization capture to enrich these libraries for 1,953 somatic SNVs discovered and validated previously in unfractionated samples [14] (Table S2). Sequencing yielded 4.1 Gb of de-duplicated data that aligned to targeted loci, resulting in an average depth of coverage of 1486 per sample (Table 1, Table S3). The subject identity corresponding to each sequencing library was confirmed using variant calls at both germline SNPs and targeted somatic SNVs (Table S4, Table S5). In order to assess the quality of our capture reagent, we compared the VAF distributions of variants in unfractionated MDS and saml samples to those previously reported [14] (Figure S1), finding a strong correlation between these independently-generated datasets (R 2 = ). Consistent with previous reports, we observed a number of differences in sequencing performance between WGA libraries and those prepared from unfractionated material [11,12,16]. In particular, single- and two-cell libraries had a lower proportion of the capture target covered at any threshold (Figure 1A). This was attributable in part to 20% fewer reads obtained from libraries prepared from WGA material (Table 1). Furthermore, these libraries had a lower on-target rate (likely driven by locus dropout) and a higher rate of PCR duplicates (i.e., reduced library complexity) (Table 1). In addition, single- and two-cell samples had a significantly less uniform distribution of reads across the capture target (Figure 1B), again reflecting WGA biases. In aggregate, these technical issues limited callable positions (sites with $256 coverage) to approximately 55% of targeted SNVs in single- and two-cell samples (41% 63% for single-cell and 46 53% for two-cell libraries). Performance of Variant Calling To quantify the accuracy of variant calling in single cells, we examined germline (i.e., inherited) SNPs genotyped previously using Affymetrix 6.0 arrays. We evaluated three separate variant callers: SAMtools [17], VarScan2 [18], and the Genome Analysis Table 1. Sequencing metrics. Total (Mb) Aligned (Mb) Aligned (%) On-Target (%) Duplicate (%) On-Target, Unique Coverage (X) Sample Average (n = 56) Unsorted Sample Average (n = 14) Single-Cell Sample Average (n = 36) Two-Cell Sample Average (n = 6) doi: /journal.pgen t001 PLOS Genetics 2 July 2014 Volume 10 Issue 7 e Figure 1. Depth and distribution of coverage for each sequencing library (n = 56). (A) Cumulative coverage represented as the proportion of the capture target (y-axis) with read depth greater than or equal to specific coverage thresholds (x-axis). Coverage values are derived from qualityfiltered data (de-duplicated, phred-scaled alignment quality $10, phred- scaled base quality $13). The intersection of each curve with y = 0.5 identifies the median coverage. Higher coverage was obtained for the unsorted samples (median 2286), compared to the single- or two-cell samples (median 286). (B) Lorenz curve detailing uniformity of coverage as proportion of targeted bases versus proportion of sequenced bases. Dashed line (y = x) represents a perfectly uniform distribution of read depth across the capture target. Libraries prepared from WGA samples (single- and two-cells) exhibit significantly less uniform representation, compared to libraries derived from unfractionated material. See Table 1 and Table S3 for additional details. doi: /journal.pgen g001 Toolkit (GATK) Unified Genotyper [19,20]. With SAMtools and VarScan2, we called variants from individual samples, whereas with GATK, we called variants jointly across all single-cell libraries. At homozygous SNPs, all three callers performed similarly (Figure 2A, 2B). However, at heterozygous SNPs (which best approximate targeted SNVs), calling samples jointly yielded a modest benefit in sensitivity, while reducing specificity (Figure 2C). Based on these results, we chose to call variants jointly using GATK at sites with $256 coverage, and we estimated our sensitivity and specificity for singe-cell variant calling to be 0.88 and 0.98, respectively. As a caveat, benchmarking joint variant calling at germline SNPs (which are present in every cell) potentially overestimates sensitivity to detect subclonal SNVs (which may be present in only a subset of cells). Nevertheless, joint variant calling likely offers a genuine increase in sensitivity, without incurring much cost in specificity, especially when calls are restricted to sites with high coverage. As shown in Table 2, the majority of genotyping errors (assessed at germline SNPs) were false negatives, i.e. failures to detect true non-reference alleles, which resulted in reduced true positive rates (TPRs). These occurred exclusively at heterozygous positions in libraries prepared from WGA material, implicating ADO as the underlying mechanism (approximately equal to the false negative rate, or FNR). This assumption is further supported by the observation that the frequency of homozygous reference calls was similar to that of homozygous variant calls at known heterozygous SNPs (Figure S2). ADO is a well-documented limitation of commercial single-cell WGA kits [11,12,16]. Nevertheless, although our analysis of germline SNPs demonstrated that single-cell reference allele calls were enriched for false negatives (at heterozygous positions), it also showed that non-reference allele calls were generally accurate (overall false positive rate, or FPR, approximately equal to 0.02). This asymmetry between FNR and FPR was critical for differentiating genuine clonal relationships among targeted SNVs from genotyping errors. Finally, we tested whether ADO could be linked to systematic (i.e., locus-specific) effects, or if it was predominantly stochastic. To do this, we compared the rate at which inherited heterozygous SNPs common to all three subjects were called reference in singlecell libraries (Figure S3). In general, the dropout rate of a specific locus across single-cell libraries from one subject was not predictive of its dropout rate across single-cell libraries in another (R 2 = ), suggesting that ADO was not attributable to strong positional biases. Validation of Sample Cellularity As an additional quality control measure, we asked if the VAF distribution in single cells could be used to infer sample cellularity. In single cells, the true (unobserved) VAF of heterozygous variants is 0.5 (at diploid loci). As shown in Figure S4, S5, S6, the VAF distributions in single-cell samples exhibited high variance (ranging from 0 to 1) compared to unsorted samples, reflecting stochastic biases in WGA. However, the mean VAF for each cluster, as well as for germline heterozygous SNPs, was fixed at approximately PLOS Genetics 3 July 2014 Volume 10 Issue 7 e Figure 2. Performance of variant calling. The specificity (A) and sensitivity (B, C) of three separate variant callers SAMtools, VarScan2, and GATK were evaluated by analyzing single-cell variant calls at germline SNPs previously ascertained by Affymetrix SNP arrays [14]. As we have defined true positive and true negative, sensitivity is undefined at homozygous reference positions (there are no true positives) and specificity is undefined at heterozygous and homozygous variant positions (there are no true negatives). Sensitivity and specificity were similar among all three callers at homozygous positions, but GATK demonstrated greater sensitivity at heterozygous sites. Variants were called jointly across all single-cell libraries with the GATK Unified Genotyper utility, whereas variants were called independently for each sample using SAMtools and VarScan2. See Table 2 and Table S6 for additional details. TPR: true positive rate. FPR: false positive rate. doi: /journal.pgen g In contrast, in intentionally cross-contaminated two-cell samples, the mean VAF of individual clusters (but never germline heterozygotes) dropped to 0.25, the precise dilution expected from the admixture of two cells sharing some, but not all, heterozygous SNVs (Figure S4, Figure S6). To analyze this further, we modeled these distributions computationally and used maximum likelihood analysis integrating a site-specific error model to assess the probability that each dataset was generated from all possible combinations of two cells. This predicted that.90% of single-cell libraries were derived from true single-cell samples (Table S7). Assessment of Tumor Clonality Previously, we generated WGS data from MDS, saml and normal samples for each subject in the current study, and we analyzed the VAF distribution of validated somatic mutations to infer the clonal architecture of each tumor [14]. In the current Table 2. Performance of variant calling at germline SNPs. # of Positions TPR FPR FNR Homozygous Sites Heterozygous Sites UPN Average: Unsorted Cells Average: Single Cells Average: Two Cells UPN Average: Unsorted Cells Average: Single Cells Average: Two Cells UPN Average: Unsorted Cells Average: Single Cells Average: Two Cells True Positive (TP): $1 non-reference allele called by Affymetrix array, $1 non-reference allele called by sequencing. True Negative (TN): 0 non-reference alleles called by Affymetrix array, 0 non-reference alleles called by sequencing. False Positive (FP): 0 non-reference alleles called by Affymetrix array, $1 non-reference allele called by sequencing. False Negative (FN): $1 non-reference allele called by Affymetrix array, $0 non-reference alleles called by sequencing. True Positive Rate (TPR): TP/(TP+FN) = sensitivity = power. False Positive Rate (FPR): FP/(FP+TN) = 1-specificity. False Negative Rate (FNR): FN/(TP+FN) = 1 - sensitivity = type II error. doi: /journal.pgen t002 PLOS Genetics 4 July 2014 Volume 10 Issue 7 e study, we applied SciClone a variational Bayesian algorithm to the original WGS data to refine these models [13,21]. As shown in Figure 3A C, groups of SNVs cluster at distinct frequencies, and we hypothesized that each cluster represented a clonal population of tumor cells. I.e., clustered SNVs were predicted to colocalize within individual cells. Furthermore, we predicted that the population frequency of putative clones was proportional to the mean VAF of the corresponding cluster. Finally, we hypothesized that clones present at successively lower frequencies evolved linearly from clones at higher frequencies, i.e., that these populations were nested. Accordingly, subjects were predicted to be monoclonal (UPN182896) or biclonal (UPN461282, UPN288033) at the time of MDS diagnosis, and harbor two or more clones upon progression t
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks