It is not uncommon to have a list of potential variants from a given whole-genome or whole-exome sequencing analysis to have some 15,000 variants that need to be thrown against dbSNP (to remove common variation first) then SIFT’ed and PolyPhen’ed (the commonly used tools to identify known or likely damaging mutations). Still, researchers are left with an unmanageable number of variants, and the functional determination to find the exact causative mutations that can lead to many fruitless searches afterwards.
With this in mind, Damien Smedley (at the European Bioinformatics Institute in Cambridge U.K.) presented his research at the recent 2014 European Society for Human Genetics in Milan Italy, entitled “Strategies for Exome Prioritization of Human Disease Genes”. He began his talk laying out current informatic methods used to winnow down a large list of SNVs from a given WES or WGS experiment (removal of low-quality SNVs, then use of tools like PolyPhen and SIFT). Additional methods are then recruited to further refine the list, including linkage analysis, trio analysis etc.
It has been observed that a ‘normal’ individual human has on average 13,595 SNVs, of which ~313 are predicted to impact protein function. (Reference: Tennesson et al. Science May 17 2012, “Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes”, doi: 10.1026/science.1219240 ) Now that’s something to ponder, or at least impress someone at your next social event.
Given the limitations of our understanding of genotype to phenotype, a new approach is needed.
Enter the mighty mouse. (Not that one – more like ‘really useful model organism’ Mus musculus.) And the mighty zebrafish, Danio rerio. These model organisms have already produced a wealth of wide-ranging phenotypic data from over 100 years’ worth of genetic study (in the case of the mouse).
His group developed an algorithm called PHIVE, which stands for PHenotypic Interpretation of Variants in Exomes, which takes into account phenotype similarity between the model organisms and human diseases, and integrated this PHIVE algorithm into a web-based tool called Exomiser. Variants are evaluated according to allele frequency, pathogenicity and mode of inheritance.
Of course the proof of any tool is in the testing of it, and in their recent Genome Research paper they were able to look at over 100,000 exomes containing known mutations. Their result reported some 54-fold improvement over purely variant-based methods (i.e. those that depend solely on frequency and predicted pathogenicity).
He also presented data that analyzed exomes with potentially causative mutations from the NIH Undiagnosed Diseases program, showing that the causative mutation was found by their Exomiser tool in the top ten ‘hits’ a full 100% of the time, with 40% of the time it was the top variant.
So take a look at this interesting approach, and then take the Exomiser tool for a spin with your VCF file of choice.