The field of genomics has expanded greatly in recent years, thanks in large part to advances in DNA sequencing technology. Methods to determine DNA sequences have been an intense area of study since discoveries of the role of DNA as the genetic blueprint and nucleotide structures in the early to mid-1900s. Today, the ability to sequence an organism’s complete genome and its associated RNA transcripts is transforming our understanding of biological concepts such as genetic diseases, cancer mutations, genealogical and evolutionary links, and gene expression patterns, to name a few.
In 1977, two pioneering methods for DNA sequencing were reported, one by Alan Maxam and Walter Gilbert  and the other by Frederick Sanger and colleagues . Until then, sequencing of even a short strand of nucleic acid 5–10 bases in length was a challenging and laborious process . Progress in methods for nucleic acid synthesis, the discovery of restriction enzymes, and the development of gel electrophoresis of nucleic acids aided toward the invention of these two groundbreaking sequencing technologies. Although the Maxam-Gilbert and Sanger sequencing methods employ different chemistries, both methods rely on visualizing DNA fragments by radioactive labeling combined with gel electrophoresis.
In the Maxam-Gilbert method, DNA fragments radiolabeled at their 5′ ends are chemically cleaved at distinct nucleotides (e.g., A and G, G, C and T, or C). The cleaved fragments are then separated by gel electrophoresis and detected by autoradiography (Figure 1).
Figure 1. Maxam-Gilbert DNA sequencing method. Radioactive labeling at the 5′ end of DNA fragments is depicted by a red asterisk.
In the Sanger method, a modified nucleotide called a 2′,3′-dideoxynucleoside triphosphate (ddNTP) (Figure 2A) is used in the synthesis of DNA that has been primed with a radiolabeled oligonucleotide. The ddNTP, also known as dideoxy, lacks the 3′-hydroxyl group which is needed to form a phosphodiester bond with an incoming regular nucleotide (i.e., deoxynucleoside triphosphate, or dNTP), and thus terminates the growing chain. Because the ddNTP concentration in the reaction mix is low, the ddNTPs are randomly incorporated in place of regular dNTPs and yield newly synthesized DNA fragments of varying lengths. These fragments are then separated by gel electrophoresis and detected by autoradiography, similar to the Maxam-Gilbert method (Figure 2B). Since the Sanger method is technically easier than the Maxam-Gilbert method, it became the gold standard for sequencing.
Figure 2. Sanger DNA sequencing method.(A) The lack of an oxygen atom at the 3′ position of the ribose of 2′,3′-dideoxynucleoside triphosphate (ddNTP) terminates DNA synthesis. (B) In the workflow, DNA is synthesized using a DNA polymerase and a radioactive primer (depicted as a red bar); the labeled synthesized fragments are then separated by electrophoresis.
For their pioneering work in DNA sequencing, Walter Gilbert and Fredrick Sanger were awarded the Nobel Prize in Chemistry in 1980 .
The use of Sanger sequencing became more prevalent with the development of automated DNA sequencing, which was made possible by nucleotide labeling with fluorescent dyes and DNA fragment separation by capillary electrophoresis.
As an alternative to radiolabeled nucleotides, fluorescently labeled nucleotides and their subsequent detection enable a faster, simpler, and safer workflow for DNA sequencing. Each of the four ddNTPs is labeled with a unique fluorescent dye. Therefore, all four chain-termination reactions (i.e., termination with fluorescent ddA, ddT, ddC, and ddG) can be performed in a single reaction. The fluorescent dyes are detected according to their emission spectra during separation of the DNA fragments by capillary electrophoresis (Figure 3). Therefore, the use of fluorescent nucleotides eliminates the need to handle radioactive reagents, facilitating automated workflows.
Similarly, capillary electrophoresis enables automation of DNA sequence analysis, resulting in higher sample throughput and faster processing. During capillary electrophoresis, products of the dideoxy reaction enter capillaries that are filled with a denaturing flowable polymer that separates the fluorescently labeled DNA fragments by size. Shortly before reaching the capillary end, the fragments pass through the path of a laser beam, each fluorescing the color that corresponds to a specific chain-terminating dideoxynucleotide (Figure 3).
The first commercial capillary sequencer was introduced in the mid-1980s by Applied Biosystems (now part of Thermo Fisher Scientific) and contributed to advances in Sanger sequencing. With automated sequencers, sequencing throughput increased from tens to thousands of bases per day. In addition, automation of data collection significantly improved sequencing accuracy because previously, readouts had been manually entered into computers. By the mid-1990s, DNA sequencers could produce as many as a million bases, or one megabase (Mb), of sequence per day, and Applied Biosystems sequencers became the primary workhorses for the NIH- and Celera-led Human Genome Projects, which were completed in the year 2000 [5-6].
Although the Human Genome Projects were notable achievements, they also highlighted challenges in sequencing large genomes using Sanger sequencing, the most significant of which were time and cost; the NIH-led project took over 10 years to complete and cost $2.7 billion . Although DNA sequencers using Sanger technology are capable of reading over 2 Mb per day, it would take years just to sequence a single human genome (which contains >3 billion base pairs) using one instrument.
Therefore, researchers developed alternative sequencing technologies with improved speed, affordability, yield, and sensitivity, so that large and complex genomes could be efficiently sequenced to better understand population genetics and their implications. In 2008, the complete genome of James Watson, a discoverer of the structure of DNA, was sequenced 7.4 times in just two months for about $1 million . Sequencing of Watson’s genome was achieved by next-generation sequencing (NGS) technology that allows massively parallel sequencing of DNA fragments of 150–400 bases. The template DNA fragments in this technology are clonally amplified and then sequenced in real time (rather than read from gel or capillary electrophoresis) as a DNA polymerase synthesizes their complementary strands. This sequencing approach is also known as sequencing by synthesis, or SBS.
Figure 4. (A) Ion Torrent sequencing technology. Clonally amplified DNA on a bead, which is deposited in each well of a semiconductor chip, is sequenced by monitoring pH changes of the solution as each nucleotide is incorporated and releases a hydrogen ion (H+). (B) Illumina sequencing technology. Clonal clusters are generated on a flow cell, and the sequence is determined from fluorescence emission from each nucleotide incorporation.
By 2010, Ion Torrent and Illumina sequencers became two of the most popular NGS platforms that are based on short-read, massively parallel sequencing technologies (Figure 4). These next-generation sequencers can read billions of sequences (or gigabases; Gb) per day, improving high-throughput efficiency, saving time and costs, and making the $1,000 genome a reality (Figure 5). The data output from these sequencers is constantly increasing, from one Gb per run in 2006 to about 6,000 Gb in 2017  (a 6,000-fold increase over 11 years), markedly outpacing the doubling of computing power every two years predicted by Moore’s Law (Figure 5).
Figure 5. Cost of sequencing a human-sized genome, from 2001 to 2017. Data were collected and reported by the National Human Genome Research Institute (NHGRI), based on DNA sequencing performed at the sequencing centers funded by NHGRI . The graph was adapted from genome.gov/sequencingcosts.
NGS platforms that rely on technology other than massively parallel sequencing of clonally amplified DNA fragments also exist. Single-molecule sequencing (SMS) is a popular approach, relying on reading long DNA sequences (kilobases in length). This long-read approach is especially helpful in sequencing DNA templates with homopolymer stretches, large structural rearrangements, and/or exons of similar sequences. In single-molecule real-time (SMRT) technology by Pacific Biosciences, extension of a DNA molecule by a DNA polymerase is detected one base at a time by fluorescence (Figure 6A). Another method, nanopore sequencing by Oxford Nanopore Technologies, detects nucleobases of single-stranded DNA as it passes through a nanopore in a membrane and disrupts an ion flow across the pore (Figure 6B).
Figure 6. (A) SMRT sequencing technology. A DNA polymerase–template complex is immobilized at the bottom of a specialized well called a zero-mode waveguide (ZMW). ZMWs allow light to illuminate only the well bottom, creating a very small detection area. Fluorescence emission, as a result of nucleotide incorporation, is detected as the DNA polymerase synthesizes a new complementary strand. (B) Nanopore sequencing technology. A double-stranded DNA is unwound and passed through a protein nanopore located in an electrically resistant membrane. The passage of the DNA strand disrupts ionic flow (current) across the membrane, and these changes are measured to identify the composition of the DNA strand.
Since its inception in the late 1970s, DNA sequencing has been instrumental in analyzing genomes and elucidating the underlying complexity of gene expression and cellular functions at the genomic level. Today, both Sanger and next-generation sequencing are employed extensively in areas including cancer research, precision/personalized medicine, inherited disease detection, new species discovery, phylogenetic analysis, agrigenomics, forensic sciences, genome editing, and recombinant DNA technologies.
For Research Use Only. Not for use in diagnostic procedures.