Overview of methods

Genomics researchers have multiple next-generation sequencing (NGS) methods to choose from when designing and implementing their studies. General NGS methods include whole genome sequencing and targeted sequencing, which is further subdivided into exome sequencing and gene or region-specific panels (see table below).  Although they can all yield gene sequences of interest to a researcher, they are far from interchangeable. While whole-genome and whole-exome sequencing are suited to discovery-based questions, targeted next-generation sequencing is preferred.  Additionally, within each of these approaches there are specialized techniques tailored to specific sample types, organisms, diseases, or regions of the genome. The table below summarizes the pros and cons of each method and includes examples and applications.

Comparison of DNA sequencing methods

 

Whole genome sequencing (WGS)

Targeted sequencing: exome sequencing Targeted sequencing: gene or region-specific panels
Description
Sequencing entire genome
Sequencing only exons (protein coding regions)
Sequencing regions of interest such as disease-associated genes, or genomic hotspots
Pros
  • Most comprehensive genome coverage
  • Detect widest range of features: SNVs, indels, structural and copy number variants, regulatory elements
  • No bias from PCR amplification or probe hybridization
  • Best for discovery research
  • 1% of human genome, much less data to analyze than WGS
  • Faster workflow than WGS
  • Multiplexing small number of samples
  • Medium sample input (50 ng–1 μg depending on library prep method)
  • Highly flexible, customizable designs
  • Data is focused specifically on regions/genes of interest
  • Lowest sample input (10 ng)
  • Compatible with FFPE tissue
  • Multiplexing large numbers of samples
  • Better for detecting rare alleles
Cons
  • A lot of potentially unnecessary data from non-coding/non-functional regions
  • Data is very complicated
  • Multiplexing usually not possible
  • Only get data in exons (may miss functionally relevant variants)
  • May be too much extra data if one only needs to study a small number of genes
  • Only get data on targeted regions (may miss relevant variants if not in design)
Speed/ return of results
Slowest Medium Fastest
Cost $$$ $$ $
Data volume
Largest Medium Smallest
When to use

 

  • Complete coverage of genome needed
  • De novo assembly
  • Discovery of unknown genomic variants causing a disease
  • Aneuploidy detection (preimplantation genetic testing)
  • Disease-specific research projects
  • Clinical sequencing
  • Clinical sequencing
  • Disease-specific research projects
  • IVD testing
  • LDT development
  • Inherited disease
  • Oncology
  • Immune repertoire
  • Liquid biopsy

Whole genome sequencing

Whole genome sequencing (WGS) is the most comprehensive method that enables an in-depth analysis of entire genomes, including exons, non-coding regions, and structural variants. WGS libraries are typically prepared using fragmentation or enzymatic digestion of genomic DNA. WGS does not require prior knowledge of the genome sequence being analyzed, so it is the best method for discovery of new genetic variations associated with disease, de novo genome assembly, microbial sequencing, and low-pass genome sequencing for copy number/aneuploidy determination.

While whole human genome sequencing has advanced discovery and human health, certain complex regions of the genome are difficult to analyze using this approach, resulting in a population sequencing bias, and existing databases are noted to be neither complete nor accurate. For many research applications, the cost of whole genome sequencing can be a burden due to the computational processing, informatics, and data storage needs for whole genome analysis. This extra cost is of little benefit when studying a specific region of interest associated with a disease or in translational research applications. To address this issue, many researchers use targeted sequencing approaches, such as exome sequencing or smaller gene or region-specific panels to improve sequence coverage and reduce their total sequencing workflow costs.

Targeted sequencing

By leveraging current genomic knowledge, a targeted NGS approach utilizes molecular biology methods to enrich for specific genetic sequences.  This enables researchers to focus their studies on individual genes or genomic regions most relevant to their research. Obtaining sequence coverage of challenging genomic regions is now possible, including regions from difficult to sequence or limited samples such as degraded DNA or RNA from clinical samiples or circulating DNA from blood.

By sequencing only what you need, cost benefits are realized from less intensive computational processing, informatics, and data storage, shorter workflows and sequencing time, and higher depth of coverage for rare variants occurring at low allelic frequency. More samples can be processed simultaneously in a single sequencing run for a faster time to result, whether for an individual sample or a cohort. Exome sequencing and target-specific panels are the two most common targeted sequencing approaches.

  • Exome sequencing enables analysis of all the exons, the protein-coding regions of the genome. About 20,000 genes comprise the human exome, which accounts for approximately 1.5% of the entire genome. This enables researches to focus on content that may be more relevant to disease compared to whole-genome sequencing.
  • Target-specific NGS panels are the most flexible option because they can be designed to sequence any gene or region of interest in a genome and can include structural and copy number variation, as well as RNA transcript analysis.   Unlike broader approaches like exome or whole-genome sequencing, targeted panels generate smaller and more manageable data sets, which reduces the data analysis burden for researchers. Using target-specific panels is the fastest and cost-effective NGS method.

Targeted NGS libraries enriched for specific genomic regions can be made using two techniques: hybridization capture or amplicon-based enrichment.

Hybridization capture uses molecules complementary to target regions to act as probes to select the target molecules of interest from the sample. These capture probes are either immobilized on a solid substrate in an array-based format or used directly in solution. In the array-based format, the DNA sample is applied to the solid surface and the targeted DNA fragments hybridize to the immobilized capture probes. Any unbound molecules are washed away, leaving the desired targets on the surface. The desired isolated genetic material is now eluted away and amplified for sequencing. With solution-based hybridization, the probes are biotinylated and hybridized targets are isolated and purified for subsequent amplification using streptavidin magnetic beads.

Amplicon-based enrichment uses carefully designed highly-multiplex PCR to amplify regions of interest from the DNA or cDNA sample. This workflow is much shorter than hybridization capture. After multiplex PCR the resulting amplicon library is purified from the sample material, ligated to sequencing adaptors with barcodes, and used for sequencing. Amplicon-based enrichment techniques have several advantages compared to hybridization capture methods, which are detailed in the next section.

Advantages of amplicon-based approaches

First, PCR specificity allows researchers to enrich for target gene regions from low sample input amounts. Limited sample sources with trace amounts of DNA and/or RNA, such as formalin-fixed paraffin embedded tissue (FFPE), fine needle aspirates, or circulating tumor DNA, can now be sequenced for biomarker discovery and retrospective clinical trials.

Second, amplicon-based enrichment can discriminate between two highly homologous regions in the genome because PCR primers can be uniquely designed to target the desired region. In contrast, hybridization capture would have difficulty distinguishing between the two homologous regions in the genome, resulting in non-specific enrichment.   For example, the PTEN gene is a known tumor suppressor gene that controls cell growth and division and is one of the most commonly mutated suppressor genes in cancer. PTENP1 is a processed pseudogene very similar in sequence, but a missense mutation in the initiation methionine codon prevents translation of the normal PTEN protein. The ability to distinguish and target the right gene clearly plays an important role in cancer research. The same concept applies when trying to target low complexity regions that are prevalent in whole genomes, such as di- and tri-nucleotide repeats.

Third, amplicon-based enrichment can better detect known insertions and fusion events than capture hybridization. Since capture hybridization requires developing complementary capture probes against a known reference genome, unknown genetic mutations could disrupt the hybridization process and result in a failure to enrich for a target region of interest. This issue is particularly relevant for genetic regions that have many variants near one another. Hypervariable regions such as the T-cell immune repertoire can be sequenced more effectively using amplicon-based enrichment, providing translational researchers a tool to discover predictive biomarkers for immunotherapy.

Please see the article Targeted Sequencing Approaches for NGS to learn more.

Top