The concept of using circulating, cell-free DNA as a non-invasive, promising source material for early-detection of cancer has been actively investigated for over ten years. The presence of cell-free DNA in plasma was first described in 1948, but received little attention until 1994 when mutated RAS oncogene fragments were detected. It is thought that the presence of circulating nucleic acids (including DNA, mRNA and micoRNAs) may reflect pathology of different kinds, including malignant and benign tumor lesions, effects of inflammation, and effects of other kinds of trauma due to apoptosis and necrosis.
The challenge however is formidable: pulling out the variable fraction of mutant DNA from a background amount that may be in as high a ratio as 1:10,000 (or a minor allele frequency as low as 0.01%). (If you are interested in what the distribution of tumor DNA to background, here’s a 2008 paper that uses a modified digital PCR method to examine the distribution of tumor DNA to normal background from circulating DNA samples.) Thus the challenge here is not a scientific one, but rather a technical one: how to pull out the rare somatic mutation from a high amount of background?
The principles of all next-generation sequencing (or rather second-generation sequencing, given the advent of single-molecule or third-generation sequencing methods) require an intermediate library amplification step (otherwise known as template preparation). By sequencing redundancy through coverage of independent reads (each independent read coming from a single independent library molecule), if the sequencing accuracy was up to 100% then somatic mutation events down to a fraction of a percent would be a function of overall coverage. That is, to detect a mutation at 0.01%, an NGS platform with up to 100% accuracy would need some 10,000-fold coverage to pick up the one mutation at that base for 0.01% detection. (Granted, due to random sampling variation a coverage of 30,000 or so would insure that that one mutation would be picked up at least once.)
Alas no next-generation sequencing system is up to 100% accurate; typically the error rate is on the order of 1%, often less. If the error is at 1%, the sequencing coverage absolutely uniform (i.e. no bias towards certain regions and against others), one can calculate the Poisson distribution statistics to determine what coverage you would need at what minor allele threshold ‘floor’. A few years ago this was done for the SOLiD® Sequencing System, and an online calculator is available. I
The point of all this is that given a non-zero error in the sequencing, there will be limits to what a massive amount of coverage will yield in terms of detecting a low-level minor allele in a heterogeneous sample. And this calculator illustrates with an interactive method a ‘floor’ of 5% allele frequency at a 99% accuracy (1% error) in the sequencing.
Given these limitations, two recent papers highlight the use of multiplex PCR combined with the Ion PGM TM System for analysis of circulating-free DNA analysis of somatic mutations.
The first, looking at circulating-free DNA in non-small cell lung tumor samples, used a manual 12-plex PCR that covered the most common hotspot mutations. In 68 retrospective samples with matched tumor samples and circulating cell-free DNA samples they determined a sensitivity of their assay at 58% and an estimated specificity of 87%.
They used a cell-free DNA coverage model of at least 10,000x, and a tumor DNA coverage model of at least 1,000x, and were able to detect somatic mutations in their circulating cell-free DNA as low as 0.2%.
This may seem at odds against what statistics would dictate per the above discussion regarding a theoretical static error rate of 1%, and perfectly unbiased sequencing, but the ability to pick up a low frequency variant in these cases reflects the likely possibility of better than 99% accuracy.
A second paper examined circulating-free DNA in metastatic breast tumor samples, and used an Ion AmpliSeq™ Cancer Hotspot v2 Panel for 50 cancer genes from 31 matched samples, where both circulating-free DNA samples as well as tumor samples were available.
These are among the first papers using any benchtop next-generation sequencing platforms to research this exciting area of cancer research.