Whole-genome, whole-exome, targeted next-generation, and Sanger sequencing methods have come a long way in recent years. Genetic disease researchers now have many advanced techniques to choose from when designing and implementing their studies. This is not a trivial decision. Although whole-genome, whole-exome, targeted next-generation, and Sanger sequencing can indeed all yield gene sequences of interest to a researcher, they are far from interchangeable. While whole-genome and whole-exome sequencing are suited to discovery-based questions, targeted next-generation or Sanger sequencing is preferred for a hypothesis driven approach. In just about every kind of disease research, the information that is already available about a specific disease goes a long way toward informing what sorts of tools can best lead to more information, and research into genetic diseases is no exception.
A common genetic research situation is the need to identify the genetic cause of a heritable condition. In many cases, the cause of a disease is a specific mutation in a known locus. This was the case in Sakuma et al.’s 2015 and Kitano et al.’s 2017 studies of hereditary hearing loss. Rather than surveying entire genomes using whole-genome sequencing, both Kitano et al. and Sakuma et al. used custom next-generation gene panels aimed at loci already implicated in hereditary hearing loss in other studies. With this approach, Kitano et al. and Sakuma et al. were able to discover many new pathogenic gene variants, which were verified by resequencing with Sanger. In fact, Sakuma et al. were able to correlate severity of mutation with degree of hearing loss in individuals. Focusing from the start on genes of known significance was helpful to avoid spending time determining the significance of a large number of possibly inconsequential variants.
Figure 1: Pedigrees and auditory analysis identified individuals with hearing loss. Variants implicated in the hearing loss were identifed by NGS and confirmed by Sanger. (Kitano et al.)
Some conditions studied by genetic research have less clear-cut relationships between gene variants and symptoms. Multi-genic conditions often follow from many simultaneous variations across numerous loci, with different patients having different variants. Studying these conditions is a much more exploratory process to start with, focused on the identification of new disease loci. An open-ended, discovery-based approach is often best in these studies.
For example, in an effort to identify loci involved with Autism Spectrum disorder (ASD) in Saudi families, Al-Mubarek et al. collected samples from 19 families that had members with ASD. They then used Ion AmpliSeq whole-exome sequencing to identify variants present in probands and other family members. Variants identified by Ion AmpliSeq technology were confirmed using Sanger sequencing. Using their data analysis pipeline, this combined approach found 47 novel variants that are potential contributers to ASD. Some of these variants were in genes previously suspected to be involved in ASD, but many others were novel and may uncover new pathways to the disorder.
Figure 3. Combined approach of Ion AmpliSeq whole-exome sequencing and Sanger sequencing found 47 novel variants that are potential contributers to Autism Spectrum Disorder (ASD) (Al-Mubarek et al.)
Some genetic diseases arise “de novo" from mutations within the gametes. These can be difficult to examine and benefit from a variety of research approaches. In Choubtom et al.’s study of progressive cerebellar degeneration, standard genetic tests showed that their patients’ conditions did not arise from anomalies in well-defined genes using the capillary electrophoresis (CE) platform. They analyzed triplet repeat expansions in other candidate neurodegenerative disease genes, and found pathogenic expansions of CAG repeats in a rare candidate gene (TBP) in some of the patients. These expansions were confirmed by Sanger sequencing. Importantly, the pathogenic expansion did not appear in the parental generations, indicating that pathogenic alleles arose de novo in the parent’s gametes.
Kano et al. used a similar hybrid approach in their 2016 genomic analysis of a single patient with primary ciliary dyskinesia. Patients with this condition have cilia that do not develop or function correctly, leading to a variety of serious health consequences, most typically compromised breathing. Although the condition can be easily diagnosed by physically observing the affected cilia, understanding of this condition's genetic basis is still unfolding. The fact that a patient's unaffected parents provide little genetic insight into the condition means that each patient must be regarded as unique, but the fact that the patients all have a similar set of conditions means that they may have genetic defects in common. In particular, previous research has implicated a number of genes related to ciliary development. Kano et al. started with by Sanger sequencing candidate genes for variants in ciliary development genes that might cause the disorder. Not finding any candidates using a targeted approach, they moved on to hypothesis-free whole-exome sequencing and found two different, transheterozygous mutations in the afflicted individual. They confirmed the identity of the alleles by Sanger sequencing and interestingly, showed the parents were each heterozygous for the two variants, explaining their patient's condition. These mutations, in turn, can now be added to future targeted panels.
Figure 4. Demonstration of a de novo mutation confirmed using Sanger sequencing.
One of the most common clinical research applications of genome sequencing is family planning. Aspiring parents often wish to prevent genetic illness in their future children, and express interest in having their genomes tested. Particularly in the context of in vitro fertilization, they might also want the genomes of their embryos tested, to determine which embryo is likely to have the healthiest life. This is a complex research area that benefits from various approaches. Many genetic diseases, especially severe conditions such as Tay-Sachs disease that begin showing symptoms very early and lead to highly abbreviated lifespans, have reasonably well-understood genetic bases and inheritance patterns, while others are more mysterious, and germline mutations can only be detected in the embryo, not the parents.
Wells et al.’s 2014 study presents an example where they sought to develop a test for embryonic aneuploidy that could be used early enough to detect this severe condition before embryo implantation. They devised a method based on next-generation sequencing that can detect chromosomal duplications and omissions (trisomies and monosomies) in DNA from a single cell. This ingenious approach takes advantage of the fact that NGS is designed to read the same sequence multiple times for higher accuracy. Wells et al. showed that specific sequences showing many more, or many fewer, reads when compared to a fully euploid test set indicates a missing or doubled chromosome, with a high degree of accuracy.
Figure 5: Next-generation sequencing detects embryonic aneuploidy in a single cell (Wells et al.)
As we have seen above, no single technology is always the solution. What sequencing method makes sense to use always goes back to the question one is trying to answer. Using a method optimized for a different kind of question is at best wasteful and at worst useless, but using the right method for one’s problem means a smooth and efficient research project.
Whole-genome sequencing is the genetic explorer’s best friend. When dealing with a rare or poorly understood disease, for example, a whole-genome sequence might be the only way to gain any understanding of how the condition works. The leads generated in this way can inform hypotheses for future research or suggest treatment avenues. Additionally, whole-genome sequencing can build a knowledge base for future researchers, particularly if the genome being sequenced belongs to a new or under-sampled species.
These strengths come at a price, however. Whole-genome sequencing can generate a monumental amount of information, making the results difficult to process and interpret. An analysis involving whole-genome sequences can involve millions of comparisons between gene loci or base pairs. Necessarily, many of the differences discovered will be false positives—incidental rather than important. Given that whole-genome sequencing samples exons, regulatory genes, and miscellaneous non-coding regions alike, a large proportion of differences discovered are likely to be irrelevant or silent, and a large proportion of the effort of analyzing such data will be spent weeding out this noise. This effort is necessary when working with unknown or poorly understood diseases, and it pays off with the ability to devise and use more targeted approaches in future studies.
With a better-defined question, exome sequencing can become the tool of choice. Consisting of the transcribed regions of the genome, the exome excludes non-coding sequences such as introns and regulatory regions. Differences in the exome almost always represent places where proteins are structurally different from their healthy or wild-type versions, and such sites are much more likely to be functionally significant than differences in introns. Exome sequencing is especially well-suited to determining a specific patient’s genetic irregularities, and to zeroing in on the cause of pre-identified issues in a particular protein. Importantly, since the exome is much smaller than the genome, sequencing it is much less expensive, a critical consideration in high-throughput laboratory and clinical settings.
Although smaller than a genome, a whole exome may still contain a lot of information that is not needed. Researchers interested in specific genes, proteins, or well-characterized conditions most likely gain no benefit from extremely broad assays such as whole-exome or whole-genome sequencing. Clinicians hoping to identify what ails a specific patient most likely need information on only a few genes. Targeted next-generation and Sanger sequencing allow researchers and diagnosticians to examine genes of known significance much more quickly than a whole genome or exome can be sequenced, and these technologies can be designed to fit a wide variety of research and medical needs. Targeted panels are also much less expensive than broader techniques. In many cases, NGS aimed at obvious targets is an ideal first pass, to be followed with other sequencing techniques if it does not yield usable information. Next-generation and Sanger sequencing have become the gold-standard methods for looking at a small number of genes at a time, due to their accuracy and efficiency.
These examples show that no genomics approach is always the right approach. Each generates a certain kind and amount of data, and whether that data is useful depends on the question being asked. Exploratory studies benefit from casting a wide net, represented by whole-genome and whole-exome sequencing. These studies can generate numerous leads for future work and identify mutations for which there are as yet no specific tests. Smaller inquiries with more specific goals, however, can get swamped in the noise of such large datasets. Such studies benefit from more focused, less expensive laboratory procedures such as targeted NGS panels and Sanger sequencing. The three approaches are not antagonistic but complementary, serving different roles in a well-rounded genomics laboratory. Knowing the best use cases of each of these techniques will lead to cleaner results, more efficient studies, and better science, to the world’s benefit.
Al-Mubarek et al. (2017) “Whole exome sequencing reveals inherited and de novo variants in autism spectrum disorder: a trio study from Saudi families”, Nature Scientific Reports 7: 5679 DOI:10.1038/s41598-017-06033-1
Kano G, Tsujii H, Takeuchi K, Nakatani K, Ikejiri M, Ogawa S, Kubo H, Nagao M, Fujisawa T. Whole-exome sequencing identification of novel DNAH5 mutations in a young patient with primary ciliary dyskinesia. Mol Med Rep. 2016 Dec;14(6):5077-5083. doi: 10.3892/mmr.2016.5871. Epub 2016 Oct 21. PubMed PMID: 27779714; PubMed Central PMCID: PMC5355724. Chobutum et al. (2015) “Analysis of SCA8, SCA10, SCA12, SCA17 and SCA19 in patients with unknown spinocerebellar ataxia: a Thai multicentre study”, BMC Neurology, 15:166
Kitano T, Miyagawa M, Nishio SY, Moteki H, Oda K, Ohyama K, Miyazaki H, Hidaka H, Nakamura KI, Murata T, Matsuoka R, Ohta Y, Nishiyama N, Kumakawa K, Furutate S, Iwasaki S, Yamada T, Ohta Y, Uehara N, Noguchi Y, Usami SI. POU4F3 mutation screening in Japanese hearing loss patients: Massively parallel DNA sequencing-based analysis identified novel variants associated with autosomal dominant hearing loss. PLoS One. 2017 May 17;12(5):e0177636. doi: 10.1371/journal.pone.0177636. eCollection 2017. PubMed PMID: 28545070; PubMed Central PMCID: PMC5435223.
Sakuma N, Moteki H, Azaiez H, Booth KT, Takahashi M, Arai Y, Shearer AE, Sloan CM, Nishio SY, Kolbe DL, Iwasaki S, Oridate N, Smith RJ, Usami S. Novel PTPRQ mutations identified in three congenital hearing loss patients with various types of hearing loss. Ann Otol Rhinol Laryngol. 2015 May;124 Suppl 1:184S-92S. doi: 10.1177/0003489415575041. Epub 2015 Mar 18. PubMed PMID: 25788564; PubMed Central PMCID: PMC4441868.
Wells D, Kaur K, Grifo J, Glassner M, Taylor JC, Fragouli E, Munne S. Clinical utilisation of a rapid low-pass whole genome sequencing technique for the diagnosis of aneuploidy in human embryos prior to implantation. J Med Genet. 2014 Aug;51(8):553-62. doi: 10.1136/jmedgenet-2014-102497. PubMed PMID: 25031024; PubMed Central PMCID: PMC4112454.
For Research Use Only. Not for use in diagnostic procedures.