Personalizing health by tailoring treatments and predicting disease risk is predicated on understanding individual variation related to disease, behavioral and environmental factors. Global initiatives have taken aim at collecting comprehensive genetic, biologic and clinical data. Unfortunately, the majority of these initiatives primarily include people of European descent, creating inequities in non-European populations. If health systems, governments and academic research institutes want to better serve their communities, citizens and research efforts, population-based biobanks need to be representative of people with different genetic backgrounds, including marginalized and hard-to-reach groups. Failure to do this will lead to further inequities in health care.
Unraveling this dilemma starts with understanding why we need genetic databases that represent population variability. Fundamentally, understanding disease variants and outcomes is impossible when you use one genetic background as a baseline and compare it to another genetic background that has its own history of environmental, migratory and social changes. To truly understand disease variants and outcomes, we have to create population-specific biobanks. A population-based biobank is a repository consisting of a large collection of biological tissue donated by individuals from the general population who might or might not have a specific disease. Phenotype information and clinical, social and environmental data points can also be collected. These genotype-phenotype population-based databases analyze DNA to help physicians and researchers better understand the genetic determinants of diseases. Ideally, the database can link biomarkers with phenotypic data, such as medical records and lifestyle information about diet, exercise and smoking. This link provides a powerful tool that can contribute to our understanding of the genetic and environmental determinants leading to conditions such as diabetes, depression, Alzheimer’s disease, cancer and congenital birth defects.
In fact, certain genetic diseases occur primarily in certain ethnic populations. Tay-Sachs disease and Canavan disease occur primarily in Ashkenazi Jews. Thalassemias occur primarily in Asian and Middle Eastern individuals. To better understand the pathophysiology of these ethno-specific diseases, there needs to be a larger proportion of individuals from each ethnicity in the biobanks.
In addition, risk predictors are more precise if they are drawn from genetic data derived from a similar ancestry. Diversifying population-level genetic data beyond Europeans will expand the power of polygenic risk scores. Polygenic risk scores developed by studying Europeans do a better job of predicting disease risk for people of European ancestry than for those of other ancestries. Hundreds of genome-wide association studies (GWASs) have been performed to better understand disease risk in European populations, but these do not necessarily extend to other ethnic groups. These GWASs need to be expanded to include better representation of global populations.
Lastly, large genotype-phenotype databases enable researchers to identify genetic outliers. Researchers using the Icelandic database identified more than 8,000 people in the collection of over 100,000 who were genetic “knockouts,” which means that these individuals had no functional copies of a particular gene. Knockouts are very attractive drug targets, since they allow researchers to see the potential side effects, or more importantly the lack of side effects, in humans before they invest in the expensive drug development process.
In the last decade, several entities have invested in these large databanks. While they have yielded important discoveries, 80 percent of participants were of European descent, even though Europeans constitute only 16 percent of the world population. Iceland was the first country with a national genotype-phenotype database initiative. Known as the Icelandic Health Database, the program sought to compile the country’s medical records and obtain the genetic profile of its citizens to improve health and health services. The Icelandic Parliament partnered with a private company, deCODE, for exclusive commercial rights to the IHD database (e.g., for developing genetic tests, mining the data for drug targets). Iceland has a remarkably homogeneous population that can trace its lineage to just a few common Northern European ancestors. The UK Biobank is one of the largest publicly available genetic data sets. It contains information for half a million people, about 94 percent of whom are of European ancestry. Fewer than 10 percent are of African, South Asian, East Asian, and Hispanic or Latino ancestry (U.K. Biobank).
The future is promising, though. According to The Global Alliance for Genomics and Health, 60 million people are estimated to have their genome sequenced in a health care setting within the next five years. Funding and a focus on underrepresented populations have sparked the launch of 15+ global precision medicine initiatives that are country-level, government-funded programs. With coordination and knowledge sharing as key aims for these initiatives, researchers will align protocols for easier data sharing, evaluate technology for clinical use, agree on the types of data to be collected, develop common data models and analytics, expand the diversity of our databases to capture population-specific variants, educate, and collaborate on participant consent processes.
An innovative example is the Taiwan Precision Medicine Initiative, a study that will collect the genetic profiles and comprehensive clinical records of 1 million Taiwanese people, or ~4% of the population, by 2022. Further highlighting a promising future, other large research initiatives like All of Us aim to enroll 1 million people across the United States to participate in a long-term study on the intersection of genetics, lifestyle, environment and health. There is an explicit goal to recruit 75 percent of the participants from groups who are typically underrepresented in biomedical research, with at least 500,000 participants from racial and ethnic minorities.
Another example is the Healthy Nevada Project which has decisively included underrepresented groups in its programs and actively educates and recruits participants in rural parts of Nevada. And very recently, a Nigerian company, 54gene, launched the African Centre for Translational Genomics (ACTG). Its first funded study, Non-Communicable Diseases — Genetic Heritage Study (NCD-GHS) Consortium, will enroll over 100,000 Nigerians and focus on the research of conditions like cancer, diabetes, Alzheimer’s disease, chronic kidney disease and sickle cell disease.
While the trend is positive, disparity matters. We are developing genetic risk tools that people from underrepresented ethnic groups will not benefit from, unless we continue to do genetic risk studies that include people from all ethnic backgrounds. It’s adding to the long-standing problem of people of color being excluded from medical research. And it could also end up increasing health care disparities. Investing in a large, ethnically diverse genotype-phenotype database can enable better population risk stratification, health resource planning, and a research resource that keeps scientists relevant in the era of “big data.”