Debunking myths on genetics and DNA

Monday, June 11, 2012

Do rare variants hold the missing answers?

Most DNA is identical across subjects. However, some genes are polymorphic, which means different alleles of the same gene are present across individuals. Since we all have two copies of each gene, individuals who carry two identical copies are called homozygous, and those who carry different copies are called heterozygous. Typically, one allele is most common in the population, the "wild type," and the other ones, present at lower frequencies, are called "mutants." Single-base differences are called single nucleotide polymorphism, or SNP (pronounced "snip"), and, on average, they occur about every thousand base pairs.

For the past 20 years, genetic research has focused on finding associations between SNPs and major diseases like cancer, Alzheimer, diabetes, etc. Back when I was doing this type of research, from 2004 until 2006, we used to exclude SNPs whose minor allele frequency (MAF) was lower than 0.5% in a given ethnic group. The logic was that it was too rare to make any significant contribution. Back then we were sampling a few hundred people and we simply didn't have enough statistical power to detect an effect when the frequency was that low.

A note from the statistician: SNP association studies ask the question, "Does mutant allele X raise the risk to develop disease Y"? As it happens with all statistical tests, the answer comes with a p-value, and the p-value represents the probability of observing the given data distribution by chance. P-values of 0.05 or lower are "good" because they mean that the chance of the association not being real but simply due to chance is low (less than 5%). On the other hand, we could make the opposite mistake: we could have missed something real. A measure of the probability of not missing a true association is given by the "power" of the test. In general, the larger the dataset, the higher the power of the test; however, the smaller the effect one is trying to detect, the lower the power. Therefore, if a rare variant does affect the risk of a certain disease, a very large dataset is needed in order to have enough power to detect the association.

In less than ten years sequencing technology has improved steadily and genotyping costs have decreased, allowing researchers to genotype many more people. Furthermore, though SNP association studies have been very informative, they still haven't answered the question of the missing heritability: a large portion of hereditary traits (including diseases) are not explained by known associations.

Bottom line: this has shifted the interest back to the "rare" variants, SNPs whose MAF is less than 0.5%.
"Rare and low frequency (MAF between 0.5%-1%) variants have been hypothesized to explain a substantial fraction of the heritability of common, complex diseases. [...] Common variants explain only a modest fraction of the heritability of most traits [1]."
Tennessen et al. sequenced 15,585 human protein-coding genes from over 2,000 individuals of either European or African ancestry, and identified more than 500,000 single nucleotide variants, 86% of which were rare.
"This excess of rare functional variants is due to the combined effects of explosive, recent accelerated population growth and weak purifying selection. Furthermore, we show that large sample sizes will be required to associate rare variants with complex traits."
In the last few thousand years populations have experienced a rapid growth that had likely gone undetected in previous studies due to small sample sizes. Most rare variants (58%) found by Tennessen et al. were population specific and nonsynonimous, meaning that they yielded different amino acids. Surprisingly, this study found that "the vast majority of protein-coding variation is evolutionarily recent, rare, and enriched for deleterious alleles. Thus, rare variation likely makes an important contribution to human phenotypic variation and disease susceptibility."

In the next couple of years we will see more and more studies looking at associations between rare variants and diseases using 454 and deep sequencing technology. Many more rare variants will be discovered and the question will be to find the meaningful ones that rise above the background noise.

[1] Tennessen, J., Bigham, A., O'Connor, T., Fu, W., Kenny, E., Gravel, S., McGee, S., Do, R., Liu, X., Jun, G., Kang, H., Jordan, D., Leal, S., Gabriel, S., Rieder, M., Abecasis, G., Altshuler, D., Nickerson, D., Boerwinkle, E., Sunyaev, S., Bustamante, C., Bamshad, M., Akey, J., , ., , ., & , . (2012). Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes Science DOI: 10.1126/science.1219240

No comments:

Post a Comment

Comments are moderated. Comments with spam links will be deleted and never published. So, if your intention is to leave a comment just to post a bogus link, please spare your time and mine. To all others: thank you for leaving a comment, I will respond as soon as possible.