Hi – Natalie here.
We received a great question from Philip at Kuwait University about terms used when talking about Sanger Sequencing data. I thought you all might be interested in learning more about this topic, so I talked to my friend Mike in our Technical Support group to get a quick lesson on Sanger sequencing data terms.
Philip’s Q: Greetings! I was wondering if you could help us understand the term “trace” and necessary QC inference from the raw data obtained from capillary electrophoresis? What is the significance of signal strength? What is Trace score, Median PUP, offscale and QV20+?
A: Hello! To help you understand the data generated from a capillary electrophoresis, let’s define some of those terms for you:
Trace: The output file from a single lane or capillary on a sequencing instrument that is imported into analysis software such as Sequencing Analysis, Variant Reporter, SeqScape, Variant Analysis module (cloud) or Quality Check module (cloud).
QV20+: Total number of bases in the entire trace that have a basecaller Quality Value >=20.
Quality Value: A measure of certainty of the basecalling and consensus calling algorithms; high quality value corresponding to a low chance of algorithm error. Trace quality values are the per-base quality values for a trace; consensus quality values are per-consensus quality values (more applicable to Variant Analysis module and Variant Reporter).
Trace Score: Average of basecall quality values for bases in the clear range.
If you turn trimming on, the clear range is the region of a sequence that remains after excluding the low-quality or error-prone sequences at the 5 prime and 3 prime ends.
Offscale: saturated data – At least one data point in the analysis range has saturated the CCD camera.
The fluorescent signal is so bright the camera cannot determine how bright it is to accurately represent it as a peak in the trace. It might appear as a split peak or a peak that is flat at the top.
Median PuP: Measure of noise – median value within the clear range of the ratio of the main called base’s signal to the signal of the highest secondary peak.
If you look at the video How Does Sanger Sequencing Work starting at about 1:30, it explains the relationship between ddNTP incorporation and fluorescence. The more ddNTP incorporation events at a particular base position = more fluorescent signal as the DNA migrates through the system. Instrument sensitivity will play a part in the signal intensity. A more sensitive instrument will be able to see DNA signal where there may have been few incorporation events that might ordinarily be lost in baseline noise of the system. However, a more sensitive instrument might also become saturated (defined as having at least one data point in the analysis range that has saturated the CCD Camera), so careful quantitation of the DNA is strongly recommended as part of your optimization process.
For other definitions of terms and how to modify them, you can click on the Actions menu in the Quality Check module in the Thermo Fisher Cloud and select “Quality Flag Settings”.