The hope of our project team was that we could find a way to pick off 10% minor variants using Sanger sequencing. These hopes were based on an insight of Edgar Schreiber’s that the difference in minor/major peak amplitude ratios between a test and control sample should stand out at base positions where variants are located. He had some preliminary results that looked promising. With the general perception that the limit of detection for Sanger sequencing is around 20%, this seemed to be a worthy goal especially considering that people are using NGS to find minor variants associated with cancer and have few alternatives to validate these results. And what if we could do better than 10%? Maybe we could provide a cheaper way to detect minor variants on a tried and true gold-standard sequencing platform.
So as I investigated further I noticed that the noise underlying Sanger sequencing data appeared to be awfully similar between control and test samples. It appeared that the noise was largely determined by the primary base sequence that was common between the control and test. This was an exciting find! It suggested that I could remove that noise from the test by using the control to model the noise. After devising a way to do this in a somewhat optimized fashion I discovered that we could detect 2.5% variants, even 1.25% variants! This was enticing! All this required was the use of existing standard chemistry to process the samples, a methodical application of the protocols to be sure that control and test samples are treated identically, both forward and reverse sequencing orientations, and running the data through software that implemented the noise removal algorithm and applied a machine learning algorithm to find the variants.
It turned out that we could not reliably detect variants at 2.5% and lower. We were able to reach >95% sensitivity and >99% specificity for 5% variants. Is this limitation caused by limitations of the software algorithm or the reagent technologies or the limitations of the instruments involved? We have not isolated the critical variables underlying the limitations yet. Though algorithms can offer cheap solutions to issues caused by instruments or reagents (software manufacturing costs are nearly nil), there are hard physical limitations. But, as this story illustrates, even complex systems that have been around for more than a decade might still be operating at levels that have not reached those physical limits. We just have to be lucky enough to discover those nuggets amenable to algorithmic solutions and collaborate with our peers in chemistry and instrumentation to engineer performance breakthroughs for the system as a whole.
Harrison Leong, Ph.D.
Sr. Staff Engineer, Software