In the field of biology, the term “sequencing” refers to the determination of the sequence of units of a linear polymer. DNA sequencing, therefore, is the use of methods and technologies to determine the identity and order of the four nucleotide bases (adenine, guanine, cytosine, and thymine) in a segment of DNA.
The three-dimensional, double-helical structure of DNA was determined in 1953 by James Watson and Francis Crick, based on crystallographic data produced by Rosalind Franklin and Maurice Wilkins. A key feature of the Watson–Crick model was the two strands, each a linear polymer of nucleotides. This model later led to revelations about how the composition and order of the nitrogenous bases of the nucleotides could encode information that was both heritable and had the capacity to determine the structure of proteins.
According to the National Human Genome Research Institute (NHGRI) of the National Institutes of Health (NIH), the human genome comprises approximately 3 billion base pairs of DNA . Since the sequence of bases carries the instructions for making proteins and also regulates gene functions, the ability to read genetic sequences is enormously valuable to biological research. Because differences in DNA and RNA sequences can distinguish organisms down to species and individual levels, sequencing is a powerful tool in forensics and may be used in the future to characterize various diseases, identify therapeutic targets, and personalize treatments. Although it is thought of as a relatively recent advancement, sequencing has a rich history.
Initial sequencing efforts focused on RNA, in part because RNA molecules are single-stranded and typically shorter than DNA molecules . The first whole RNA sequence was published in 1965 by Robert Holley and colleagues . In parallel, a technique for separating and detecting radiolabeled RNA fragments by a combination of charge-based separation (electrophoresis) and chromatography—called 2D fractionation—was developed by Frederick Sanger .
In the mid-1970s, 2D fractionation was replaced with separation based upon size (i.e., polynucleotide length) via electrophoresis through polyacrylamide gels and applied to DNA. In 1976, Maxam and Gilbert developed a technique in which DNA is chemically treated to break the chain at specific bases; following electrophoresis of the cleaved DNA, the relative lengths of the fragments—and thus the positions of specific nucleotides—can be determined and the sequence inferred . This is considered the birth of first-generation sequencing; however, the breakthrough that would propel DNA sequencing into the future came a year later, in 1977, with Sanger’s chain-termination method .
The chain-termination method, also known as Sanger sequencing, makes use of chemical analogs of the four nucleotides. These analogs, called ddNTPs, are missing the hydroxyl group that is required for extension of a DNA polynucleotide chain. By mixing dye-labeled ddNTPs and template DNA in a reaction, strands of each possible length are produced when the ddNTPs get randomly incorporated, terminating the chain. The products are then separated by electrophoresis and the sequence read. Due to its relative simplicity, accuracy, and reliability, Sanger sequencing would become the gold standard in sequencing technology and remains so today.
To learn more about Sanger sequencing, see the article What is Sanger sequencing?
While a number of improvements have been made to Sanger sequencing over the years, new techniques have also arisen. One of the first next-generation sequencing (NGS) technologies to arise is based on a fundamentally different method that utilizes luminescence to measure pyrophosphate synthesis, termed pyrosequencing.
To find out more about NGS, see the NGS Basics articles.
More than 10 years after it was first described in 1993, pyrosequencing was licensed to 454 Life Sciences, a biotechnology company founded by Jonathan Rothberg, which commercialized it. This initiated a paradigm shift powered by high-throughput DNA sequencing techniques that are capable of generating gigabases of sequence data per day for a fraction of the original cost of NGS. These advances have inspired researchers to address bolder and bolder questions in genome-wide investigations .
To find out more about the types of sequencing, see the article What are the different types of sequencing technologies?.
The completion of a decade-long project to sequence the human genome represented a major milestone in the history of DNA sequencing, and the start of its modern history. On June 26, 2000, the International Human Genome Sequencing Consortium published the first draft of a consensus sequence of the human genome, compiled from the sequences of multiple anonymous volunteers. The finished version of the human genome sequence was published in April 2003. This version, which is available to the public, provides nearly all the information needed to do research on the whole human genome, which in turn enables us to better understand the genetic basis for an individual’s health and the pathology of a disease .
For Research Use Only. Not for use in diagnostic procedures.