CCAAT enhancer binding protein alpha (CEBPA) is a single-exon gene that encodes a leucine zipper transcription factor with an important role in myeloid differentiation, a necessary step in the development of acute myeloid leukemia (AML) (1). CEBPA mutations occur in 8-15% of AML cases and are one of the most common classes of mutations in cytogenetically-normal AML (2-4). The majority of CEBPA mutations are biallelic and are associated with favorable outcomes, as no wildtype protein is expressed to allow myeloid differention (5,6). It is believed that these mutations cause myeloid cell differentiation arrest resulting in a tumor suppressor effect (3). As of 2016, the WHO requires characterization of CEBPA mutations in the classification of AML (7). Therefore, reliable methods to detect these mutations is crucial for tumor classification.
CEBPA mutations are highly variable and can occur anywhere throughout the gene. Therefore, it is important to be able to detect mutations across the entire coding region. Sanger sequencing is the current gold standard technique, detecting CEBPA mutations down to 20% AF, but this approach lacks scalability. Next-generation sequencing (NGS) assays can detect multiple mutation types across multiple target regions. However, detecting mutations in CEBPA can be particularly challenging for these assays due to the high GC content (8).
Anchored Multiplex PCR (AMP™) is a target enrichment strategy for NGS that uses molecular barcode (MBC) adapters and single gene-specific primers (GSPs) for amplification, permitting open-ended capture of DNA fragments from a single end. Based on AMP technology, we developed Archer® VariantPlex® myeloid targeted NGS assays to detect variants in CEBPA, FLT3-ITDs and other important variants in myeloid cancers from clinical-type genomic DNA samples. Because the MBC adapters contain universal primer binding sites, amplification from GSPs is unrestricted by opposing primers and can amplify both large and small fragments without prior knowledge of downstream sequences. This approach enables flexible and strand-specific primer design to provide better coverage of challenging regions. As such, anchored reads originating from bidirectional, yet independent, GSPs contained in the VariantPlex myeloid panels provide excellent coverage across CEBPA, even in GC-rich regions (Figure 1).
As shown in Figure 1, the VariantPlex Myeloid panel achieves full coverage of CEBPA at depths that allow variant calling at relevant allele frequencies. It’s important to note that Figure 1 shows unique molecule coverage, not PCR duplicated reads. Confidence in variant detection over a region is mainly dictated by the number of unique molecules interrogated. Since MBCs are ligated prior to PCR, the Archer Analysis bioinformatics software is able to deduplicate, error correct, and analyze unique reads over every position.
Variant sensitivity is a function of both coverage depth and noise at that depth, which is accounted for using normalization datasets to assess noise at each variant position. Using this approach, background noise at each base position is assessed across a cohort of diverse wild-type samples. From this information, the sensitivity of identifying a true positive variant over a range of allele frequencies is calculated for each position. Therefore, one can determine the limit of detectability for a variant at a base position and for a particular base substitution. This metric is referred to as 95 MDAF, the minimum detectable allele frequency by which a variant can be distinguished from the underlying noise at a probability of 0.95.
As CEBPA mutations frequently occur in GC-rich regions, it is important to be able to assess the sensitivity and coverage at each position rather than averaging the entire coding region. In Figure 2 below, the 95 MDAF was calculated for the least sensitive of any of the three possible base substitutions for each base position (blue line). This plot reveals the considerable noise variation between base positions and highlights the need for a per-base sensitivity threshold. The plot also enables more confident variant calling in GC-rich regions by identifying where variant calling may be more sensitive than a generic, panel-wide limit-of-detection (Figure 2).
CEBPA mutations are important prognostic indicators in AML, as they are associated with favorable outcomes. However, detecting CEBPA mutations by NGS is challenging due to GC content of the gene. By combining MBCs and single gene-specific primers, AMP enables strand-specific amplification to provide bidirectional coverage of CEBPA, thus providing better coverage of GC-rich areas. MBCs enable post-sequencing deduplication and error correction, and normalization datasets are used to assess the noise at each base position. This information together has been used to determine the 95 MDAF, which is the minimum detectable allele frequency by which a variant can be distinguished from the underlying noise at a probability of 0.95. The 95 MDAF essentially reveals the lower limit of detection at each base position in a sample, providing more useful information than panel-wide sensitivity information. This is particularly useful for genes like CEBPA that have difficult regions, where every variant call can be compared to the 95 MDAF and assigned a p-value for high confidence variant detection and optimal sensitivity and specificity.
2477 55th Street, Suite 202
Boulder, CO 80301