Let’s start at the end. Genotyping outperforms sequencing in high-throughput, large-scale data analyses like genetic ancestry testing.
Genotyping is the cheapest and fastest genetic technology available to wrangle the beast of big data that comes with the territory of ancestry genetic testing. And, genotyping provides broad coverage across the genome and the best alignment with existing reference population databases, so your company doesn’t have to start from scratch.
But let’s back up a bit and cover the basics.
The ins and outs of ancestry genetic tests
As genetic testing continues to spread from the academic and research sector into the public sphere, customers are clamoring for answers about their ancestry. Direct-to-consumer (DTC) genetic tests are winning the hearts and dollars of consumers in a way that no other life science-based market has ever seen. Ancestry tests are the clear consumer favorite in the DTC genetic testing market, with a larger customer base than other types of genetic testing, such as those for cancer or heart disease. The DTC ancestry customer base now exceeds 20 million people.
What is it about ancestry genetic testing that mesmerizes consumers? Seeing one’s own A’s, T’s, C’s and G’s parsed into a story of their origins is tantalizing to many people; they get to interact with scientific data in an accessible way, and that data can provide personalized entertainment, solve family questions and even shape a person’s perception of their own identity. Ancestry genetic testing is an exciting scientific tool that can have powerful effects on those who purchase it. Here is what you need to know before you decide to offer ancestry testing.
DNA is DNA is DNA, right? Wrong
Before venturing into this promising market, it’s important to understand exactly what ancestry tests are and how they work. There are three main ways to use genetics to learn about one’s ancestry: mtDNA, Y-chromosomal DNA and autosomal DNA testing.
- Mitochondrial DNA (mtDNA) is the DNA inside the mitochondria organelles within each cell. It can tell the story of a person’s maternal (mother’s) lineage.
- Y-chromosomal DNA is only present in men, and is passed from father to son, but not from father to daughter. Since only males carry and can pass on Y-chromosome DNA, it can tell the story of a man’s paternal (father’s) lineage.
- Autosomal DNA is the remaining non-sex-specific DNA, which makes up the other 22 pairs of chromosomes that men and women each have. Because autosomal DNA is inherited from both parents, it can tell the story of ancestry on both sides of a person’s family.
The ideal ancestry test combines autosomal DNA, mtDNA and Y-DNA, because alone any one of these DNA types only tells part of customer’s ancestry story. Autosomal DNA testing is the most common type of test offered, and in many cases, the only type offered. However, Y-chromosome DNA and mtDNA can give a customer a maternal or paternal ancestral lineage that can be traced further back in time than autosomal DNA (but lacks overall regional ancestry percentage estimates). Autosomal DNA can provide ancestry percentage estimates based on relatively recent ancestry, or those with whom you have common ancestry today.
The most well-rounded approach to garnering the largest customer base is to test all three types of DNA. But how can this be accomplished in a cost-effective and time-efficient manner? (Hint: I told you at the beginning of this post.)
Pick the right technology
The fastest, most cost-effective way to run a genetic test on many individuals at once is by using microarray technology. For ancestry genetics, the most utilized genomic technology is the genotyping array (sometimes also called a SNP array), because this technology allows the capture of autosomal, YDNA and mtDNA all at the same time. Also, because genotyping arrays were the foundation upon which ancestral reference populations were built in the early 2000s, genotyping remains the gold standard for ancestry testing today.
What is genotyping?
Genotyping is a type of genetic testing that looks at select genetic markers in the human genome, usually accounting for <0.5% of the entire genome. The genetic markers targeted by genotyping are called single nucleotide polymorphisms (SNPs). To provide good ancestry estimates, more than 600,000 genetic markers are typically genotyped on each individual. The majority (or at least half) of these markers are autosomal, while the remainder are located on the sex chromosomes (including Y-chromosome DNA) or in the mtDNA.
Genotyping vs. gene sequencing
Often used interchangeably by the layperson, genotyping and gene sequencing are two different techniques. Genotyping only looks at certain select genetic markers from across the genome (approximately once every 1,000 letters of DNA), but sequencing looks at a string of DNA in a row.
Whole-exome sequencing (WES) and whole-genome sequencing (WGS) focus only on the coding regions of the human genome, which makes this technology superior for finding clinically novel, rare genetic markers. However, in ancestry testing, novel or rare markers are not useful; rather, it’s the population-specific patterns of common genetic markers that are of interest. And with genotyping, the DNA analysis is not limited to only the coding regions, so more information from regions such as the Y chromosome and MtDNA may be captured.
Genotyping is hands-down the best tool for ancestry genetic tests
One might think that obtaining only a tiny portion of genomic information would be a disadvantage, but this is actually the main advantage of using genotyping microarrays for creating ancestry products — it means that it is much cheaper, quicker and more efficient than sequencing — all good things when it comes to processing thousands of customer DNA samples and maintaining a price point that is attractive to consumers.Genotyping arrays cost several hundred dollars less per sample than WES, and more than $1,000 less per sample than WGS. Plus, genotyping by microarrays means you can process 1,000-2,000 samples per week, each with hundreds of thousands of SNPs of interest. Large-scale genetic testing requires a fast and cost-effective method, and genotyping arrays are the perfect solution, which is why half of all DTC genetic tests use genotypingmicroarrays.
Worried about using the same SNP microarray as your competitors? Don’t fret. You can always customize your SNP chip. Genotyping arrays can be bought with off-the-shelf SNPs (most chips have at least 700,000 pre-selected SNPs), but you have the ability to also add custom-selected SNPs. And because genotyping arrays are superior to whole-exome sequencing when it comes to the non-coding regions of the genome (the intronic regions), you can select SNPs from novel regions of the genome not captured by other sequencing technologies, thereby differentiating your ancestry test from others on the market.
Now you are well on your way to creating a consumer-loved genetic ancestry test.
Next, you need an algorithm
Once you’ve chosen the type of DNA to be tested, it’s important to also consider how the genetic data will be analyzed. You’ll need a robust ancestry algorithm (see for example, the statistical methods of 23andme or Ancestry), and for that you’ll need to hire a bioinformatician…. or two. And to create your algorithm, you’ll also need…
A reference dataset
Of course, your algorithm will only be as good as the database it pulls information from, so you’ll also need a large dataset of a reference people to compare to your customers’ genetic sequences. There are publicly available databases, such as the 1000 Genomes Project,UK Biobank and the Human Genome Diversity Project. However, these alone likely won’t be large enough!
Reference datasets are typically drawn from both public databases and an additional 3,000+ individuals from the customer base. Typically this means you must have some existing customers who are willing to provide their ancestors’ nationalities to serve as a proxy for geographic ancestry. Because each company has a different reference database for their algorithms to analyze, each company’s genetic test will give slightly different ancestry estimates. The best way to ensure your results are as robust is possible is to have a large and diverse reference database, with individuals from all over the world. The more customers you have in your database and the more diversity their genomes harbor, the better the ancestry estimates will be.
The database sizes for the leading companies in ancestry genetics range from 230,000 to 15 million customers. But there is still room for improvement. For example, many niche populations are not well represented in even the largest databases, so there remains an opportunity for new ancestry tests to grab a share of the market by serving underrepresented ancestral populations.
As customers yearn to learn more about themselves through genetic testing, ancestry tests that are fast, affordable and cater to unique and diverse populations will earn consumers’ attention. Genotyping arrays enable all of these factors. The only question that remains is: what will you name your test?