The evidence for more complete human exomes mounts
Five years ago, Dr. Richard Gibbs of the Baylor College of Medicine gave a memorable plenary presentation at the annual Advances in Genome Biology and Technology conference in Marco Island, Florida.”
Entitled “Genome Sequencing to Health and Biological Insight”, in his introduction he illustrated the seesaw battle between whole-genome or whole-exome sequencing approaches to a targeted sequencing approach by showing a drawing of two aluminum cans of popular cola soft-drinks, as a cartoon with boxing gloves. Illustrating this question of approaches as a battle, he said that while he favored the whole-genome / whole-exome sequencing approach he didn’t see any end to this struggle for the foreseeable future.
Fast-forward five years and the number of whole-genomes / whole-exomes approaches worldwide has roughly doubled every year since, and estimates are that this trend will continue. Yet with a nominal definition of whole-genome sequence at an average 30x haploid coverage (remember, with a diploid genome any heterozygous variant will then be covered at an average of 15x since the 30x average is across two different alleles), there still remains the problem of missing data. Similarly, a nominal definition of whole-exome sequence is at an average read-depth of 100x haploid coverage (each heterozygous variant at an average 50x coverage).
The Missing Data Problem
The ‘missing data problem’ manifests itself as a false-negative; an underlying variant that may have significant contribution to a particular genetic disorder is simply missed, going undetected. And the reasons for inadequate coverage in a particular region of the genome are manifold: very high (or very low) G-C content is a primary culprit; library-preparation methods and shearing methods all have their biases; and the choice of sequencing platform have platform-specific biases that commercial vendors work hard to minimize (including development scientists here at Thermo Fisher Scientific).
On top of all these underlying potential causes are particular decisions and judgments made in the data analysis. For example, it was pointed out that the choice of which variant calling algorithm makes a significant difference in the observed accuracy. Other choices, such as the minimum number of bases needed to call a particular variant (remember an average of 15 base coverage for a particular base can vary widely) can also affect the sensitivity of catching as many variants as possible.
Naturally the balance between lowering the false-negative rate (minimizing the missing variants) and lowering the false-positive rate (minimizing the apparent variants that turn out to be false) is a tug-of-war between opposites: you can get very high sensitivity (low false-negatives) but at the cost of low specificity (high false-positives). And one researcher told me once, “false-negatives I don’t worry too much about, because I don’t know what I don’t see; however false-positives are bad because they are expensive to pursue”. In this context ‘expensive’ means a waste of both limited funds (in performing needless verifications of variant findings) as well as effort and time.
A Topic of Recent Publications
Over the past few years there have been several publications highlighting the nuances in whole-genome sequencing coverage and targeted whole-exome sequencing. For example, the National Human Genome Research Institute of the US National Institutes of Health published a paper in 2011 pointing out that in their analysis of a single sample, almost 30% of the variants in the exome of that sample were missed with a 30x coverage of that sample as a whole-genome sequencing experiment. (1)
Another publication in 2014 analyzed 12 volunteer individual’s whole genome sequence (9 of these also sequenced using an alternative technology) and a full 10 to 19 percent of variants in genes known to play important roles in inherited diseases were not adequately covered.(2)
Research using 57 whole-exome datasets were analyzed (these datasets had between 74x and 120x coverage) that looked at 56 genes identified which the American College of Medical Genetics and Genomics deem ‘relevant’. In 7 of these genes greater than 50% of the Human Gene Mutation Database variant locations had inadequate coverage. The results of this study add to the growing evidence that due to technical factors (such as G-C content and presence of psuedogenes) gene-panel sequencing would be preferable over whole-exome sequencing for determining disease-causative mutations in research. (3)
No ‘One Right Answer’
There is no one right answer for which approach to use; it all depends upon the kind of question that needs to be answered. A targeted approach is certainly justified when there is a need to determine every underlying variant from a subset of genes. And a whole-exome (or whole-genome) approach is justified when the list of potential causative genes are not completely known, and worth the risk of false-negative results in support of research. And so we offer both.
The Ion AmpliSeq™ RDY Exome Kit coupled with the Ion Proton™ Sequencing System allows for rapid sequencing from DNA sample to variants in 2 days; the Ion AmpliSeq™ Panels coupled with the Ion PGM™ Sequencing System enabled up to 6144-plex amplification in a single tube from 10ng of sample, and now can be designed for any genome. (We have recently written about the Ion AmpliSeq™ panels for Any Genome on Behind the Bench.)
If you are looking to outsource your next-generation sequencing needs, do consider one of our Ion AmpliSeq Exome Certified Service Providers.
For Research Use Only. Not for use in diagnostic procedures.
1. Ajay SS, Parker SC, Abaan HO, Fajardo KV, Margulies EH. Accurate and comprehensive sequencing of personal genomes. Genome Res. 2011 Sep 21(9):1498-505. doi: 10.1101/gr.123638.111. Epub 2011 Jul 19
2. Dewey FE, Grove ME, Pan C, Goldstein BA, Bernstein JA, Chaib H, Merker JD, Goldfeder RL, Enns GM, David SP, Pakdaman N, Ormond KE, Caleshu C, Kingham K, Klein TE, Whirl-Carrillo M, Sakamoto K, Wheeler MT, Butte AJ, Ford JM, Boxer L, Ioannidis JP, Yeung AC, Altman RB, Assimes TL, Snyder M, Ashley EA, Quertermous T. JAMA. 2014 Mar 12 311(10):1035-45. doi: 10.1001/jama.2014.1717
3. Park JY, Clark P, Londin E, Sponziello M, Kricka LJ, Fortina P. Clin Chem. 2015 Jan 61(1):213-20. doi: 10.1373/clinchem.2014.231456