Introduction to NGS

Next-generation sequencing (NGS) is a technology for determining the sequence of DNA or RNA to study genetic variation associated with diseases or other biological phenomena.  Introduced for commercial use in 2005, this method was initially called “massively-parallel sequencing”, because it enabled the sequencing of many DNA strands at the same time, instead of one at a time as with traditional Sanger sequencing by capillary electrophoresis (CE).

Each of these technologies has utility in today’s genetic analysis environment. Sanger sequencing is best for analyzing small numbers of gene targets and samples and can be accomplished in a single day.  It is also considered the gold-standard sequencing technology, so NGS results are often verified using Sanger sequencing. NGS enables the interrogation of hundreds to thousands of genes at one time in multiple samples, as well as discovery and analysis of different types of genomic features in a single sequencing run, from single nucleotide variants (SNVs), to copy number and structural variants, and even RNA fusions. NGS provides the ideal throughput per run, and studies can be performed quickly and cost-effectively. Additional advantages of NGS include lower sample input requirements, higher accuracy, and ability to detect variants at lower allele frequencies than with Sanger sequencing.

The speed, throughput, and accuracy of NGS has revolutionized genetic analysis and enabled new applications in genomic and clinical research, reproductive health, and environmental, agricultural, and forensic science.

NGS workflow

A typical NGS experiment shares similar steps regardless of the instrument technology used (Figure 1).

Figure 1. NGS workflow steps
Figure 1. NGS workflow steps

1. Construct library

A sequencing “library” must be created from the sample. The DNA (or cDNA) sample is processed into relatively short double-stranded fragments (100–800 bp). Depending on the specific application, DNA fragmentation can be performed in a variety of ways, including physical shearing, enzyme digestion, and PCR-based ampilficati.on of specific genetic regions. The resulting DNA fragments are then ligated to technology-specific adaptor sequences, forming a fragment library.  These adaptors may also have a unique molecular “barcode”, so each sample can be tagged with a unique DNA sequence. This allows for multiple samples to be mixed together and sequenced at the same time. For example, barcodes 1-20 can be used to individually label 20 samples and then analyze them in a single sequencing run. This approach, called “pooling” or “multiplexing”, saves time and money during sequencing experiments and controls for workflow variation, as pooled samples are processed together.

In addition to fragment libraries, there are two other specialized methods of library preparation: paired-end libraries and mate-pair libraries. Paired-end libraries allow users to sequence the DNA fragment from both ends, instead of typical sequencing which occurs only in a single direction. Paired-end libraries are created like regular fragment libraries, but they have adaptor tags on both ends of the DNA insert that enable sequencing from two directions. This methodology makes it easier to map reads and can be used to improve detection of genomic rearrangements, repetitive sequence elements, and RNA gene fusions or splice variants.  However, improvements in modern library prep methods and analysis tools have made it possible to detect these features with single direction sequencing as well.

Mate-pair libraries are more complex to create than fragment or paired-end libraries and involve much larger-sized DNA inserts (over 2 kb and up to 30 kb).  Sequencing of mate-pair libraries generates two reads that are distal to each other and in the opposite orientation. Using the physical information associated between the two sequencing reads, mate pair sequencing is useful for de novo assembly, large structural variant detection, and identification of complex genomic rearrangements.

2. Clonal amplification

Prior to sequencing, the DNA library must be attached to a solid surface and clonally amplified to increase the signal that can be detected from each target during sequencing. During this process, each unique DNA molecule in the library is bound to the surface of a bead or a flow-cell and PCR amplified to create a set of identical clones. In the case of Ion Torrent technology, a process called “templating” is used to add library molecules to beads.  To learn more, see How Ion Torrent Technology Works.

3. Sequence library

All of the DNA in the library is sequenced at the same time using a sequencing instrument.  Although each NGS technology is unique, they all utilize a version of the "sequencing by synthesis" method, reading individual bases as they grow along a polymerized strand. This is a cycle with common steps: DNA base synthesis on single stranded DNA, followed by detection of the incorporated base, and then subsequent removal of reactants to restart the cycle.

Most sequencing instruments use optical detection to determine nucleotide incorporation during DNA synthesis. Ion Torrent instruments use electrical detection to sense the release of hydrogen ions, which naturally occurs when nucleotides are incorporated during DNA synthesis. To learn more see, How Ion Torrent Technology Works.

4. Analyze data

Each NGS experiment generates large quantities of complex data consisting of short DNA reads.  Although each technology platform has its own algorithms and data analysis tools, they share a similar analysis ‘pipeline’ and use common metrics to evaluate the quality of NGS data sets.  

Analysis can be divided into three steps: primary, secondary, and tertiary analysis (Figure 2).  Primary analysis is the processing of raw signals from instrument detectors into digitized data or base calls. These raw data are collected during each sequencing cycle. The output of primary analysis is files containing base calls assembled into sequencing reads (FASTQ files) and their associated quality scores (Phred quality score).  Secondary analysis involves read filtering and trimming based on quality, followed by alignment of reads to a reference genome or assembly of reads for novel genomes, and finally by variant calling. The main output is a BAM file containing aligned reads. Tertiary analysis is the most challenging step, as it involves interpreting results and extracting meaningful information from the data.

Figure 2. NGS analysis pipeline overview
Figure 3. NGS analysis pipeline overview

Due to the complexity of NGS data and associated algorithms, NGS analysis is typically performed by bioinformatics specialists.  To empower users who don’t have specialized bioinformatics training, platforms like Ion Torrent have created user-friendly, intuitive software that simplifies analysis and doesn’t require programming skills to get results. For a primer on the basic metrics used to analyze NGS data please see the article The Importance of Throughput and Coverage.