Introduction

Metagenomics is the study of genetic material recovered directly from environmental samples, such as on the human body. Initial genetic studies of the microbiome involved employing traditional microbiology techniques to cultivate the microbiota and subsequently performing molecular cloning and gene sequencing to produce a metagenomic profile of the microbiome sample. However, many microbes are not culturable, and so the use of traditional microbiology techniques cannot provide a full genetic profile of the microbiome for research.

Next-generation sequencing (NGS) has enabled a culture-independent analysis of the microbiome previously not thought possible due to its high-throughput nature and parallel advancements in bioinformatics for data analysis. Because of NGS, there are now many research studies associating the microbiome with disease, from obesity and autism (1) to cancer, as well as its influence on cancer therapy (2).

Microbiome research and its impact on health continues to grow rapidly as genomic microbial databases grow and further insights are gleaned from basic research and clinical investigations. In this article, learn about some of the general NGS considerations and the different NGS methods that can empower your microbiome research.

Considerations for microbiome research using NGS

Metagenomic sequencing using NGS is a powerful method to quantitatively characterize microbiomes, providing insights into microbial populations and helping discover microbes that are not culturable or may be in relatively low abundance and not detectable using traditional methods. There are many factors to consider for a successful microbiome study (3). 

For NGS studies in particular, sequencing coverage and throughput are important concepts to ensure proper data is obtained to address your scientific questions and objectives. Having appropriate coverage and throughput ensure that you have enough data to genetically characterize the microbiome. This includes determining the number of operational taxonomic units (OTUs) from your samples. In the context of NGS, OTUs refer to the clusters of organisms that are grouped together due to DNA sequence similarity using specific taxonomic markers. 

Long NGS sequencing reads can help address coverage considerations by enabling researchers to sequence difficult regions of the genome. Longer sequencing reads are also more unique and can be more accurately assigned to an appropriate reference genome, or be grouped for an unknown species that is not within the genomic database. Thus, a more accurate classification of microbes in the microbiome can be achieved.

Sequencing accuracy is an important consideration to ensure the correct sequencing variants are being called. Poor accuracy can lead to an inflation in defined OTUs and thus an overestimate in microbiome diversity (4). 

Given the microbiome is a complex ecosystem, one can see how the data would be difficult to analyze with sequence information from thousands of microbial species and strains. This can be further compounded by sequencing numerous samples taken from different sites on a single individual, across a human population, and even over a temporal scale. The bioinformatics analysis and computation requirements need to be considered during study design.

Of final consideration for microbiome research using NGS is the various methods that one can use to generate the data.

Shotgun metagenomic sequencing

Shotgun sequencing is a method used to randomly sequence DNA strands within a given sample. The DNA within the sample is sheared into smaller fragments and subsequently sequenced using NGS. Given its untargeted nature, shotgun metagenomic sequencing allows researchers to study all microbial genomes without prior knowledge of the community. It provides a way to discover unculturable microorganisms. With sufficient sequencing throughput, rare microbial species and ones of low abundance within the microbiome can also be detected. 

While great for microbiome discovery, it is possible that members within the microbiome may still not be characterized using shotgun sequencing for various reasons, such as contaminant DNA or the size and complexity of the microbiome. Shotgun metagenomic sequencing can be cost prohibitive and computationally expensive. Reconstructing the microbial composition of a community from a random pool of DNA sequences is difficult, conceptually similar to whole genome assembly (5). The analysis is reliant on the >50,000 microbial genomes that are available within current databases and thus affected by the biases in the references used. Strain-level profiling is more difficult due to the lower genomic resolution from shotgun sequencing of the entire microbiome, relative to what one can obtain from sequencing using a more focused approach such as sequencing single isolates.

Targeted metagenomic sequencing

Targeted sequencing approaches allow researchers to focus their analysis on individual genes or genomic regions. By leveraging current genomic knowledge, targeted NGS can be used to improve coverage, simplify analysis and interpretation, and lower the total sequencing workflow costs. 

16S ribosomal RNA (rRNA) sequencing is the most widely used method for characterizations bacterial populations, taxonomical analysis and species identification. The 16S rRNA gene is used for phylogenetic studies because it is highly conserved between different species of bacteria and archaea. The bacterial 16S gene contains nine hypervariable regions (V1-V9) involved in the secondary structure of the small ribosomal unit (Fig 1).

16S
Figure 1. 16S rRNA secondary structure. Marked as public domain; more details at Wikimedia Commons.

The degree of conservation within the hypervariable V region is used for taxonomic assignment. Initial 16S analysis utilized PCR and would focus on only a few, or even one, V region to provide the taxonomic level. With the high throughput of NGS, researchers can increase the number of V regions analyzed to provide more discriminatory profiling, which would be beneficial in a variety of settings. For example, the resolution from sequencing multiple V regions could help identify pathogenic bacteria in a healthcare setting [6].

The high-throughput and multiplexing capabilities of NGS mean that researchers are not limited to sequencing just the V regions. Targeted sequencing can be used to distinguish between closely related species and detect antimicrobial resistance. As our knowledge of the microbiome and its influence on health grows, targeted microbiome sequencing can be used to focus on biomarkers that may correlate with disease and predict therapeutic outcome.

References

  1. Gilbert JA, et al. Nat Med. 24:392 (2018)
  2. Helmink BA, et al. Nat Med. 25:377 (2019)
  3. Poussin C, et al. Drug Disc Today. 23:1644 (2018)
  4. Xue Z, et al. mSphere. 3:e00410 (2018)
  5. Quince C, et al. Nat Biotech. 9:833 (2017)
  6. Chakravorty S, et al. J Microbiol Methods. 69:3430 (2007)