Debunking myths on genetics and DNA

Monday, September 24, 2012

ENCODE sheds light on non-coding variants

Back when I started studying human genetics, we were still doing single-gene associations. Namely, we would type a bunch of variants in a single gene and then do a case-control association study to see which, if any, of those variants marked an increase in disease risk. That's how breast cancer markers such as BRCA1 and BRCA2 have been found.

When the Human Genome Project was completed in 2003, scientists started looking for disease risk alleles across the whole genome. The findings were puzzling: more than 90% of the diseases-associated variants fell in non-coding regions. Why? One issue I've previously discussed is that when looking at tens of thousands of loci, you need huge sample sizes and often, when huge sample sizes aren't feasible, these studies are underpowered. Another possible explanation lies in epistasis, and the detected signal may be the effect of some unknown correlation.

However. You knew there was going to be a "however", right? Because thanks to the ENCODE project we now know that if a genetic variant falls in a non-coding region, it doesn't mean it has no effect whatsoever. ENCODE is bound to shed new light on these numerous non-coding risk alleles that genome-wide association studies (GWAS) studies have found.

Last time I discussed DHSs, or DNase I hypersensitive sites. These are chromatin regions where many regulatory elements have been found. In [1], Maurano et al. show that many of the non-coding variants associated with common diseases are concentrated in regulatory DNA marked by DHSs. The researchers performed genome-wide DNase I mapping across 349 cell and tissue types. As discussed last week, regions of DNase I accessibility harbor regulatory elements. The researchers also examined the distribution of 5654 non-coding SNPs (single base variants) that had been significantly associated to some disease or trait in genome-wide studies.

These the main findings:
"Fully 76.6% of all noncoding GWAS SNPs either lie within a DHS (57.1%, 2931 SNPs) or are in complete linkage disequilibrium (LD) with SNPs in a near-by DHS (19.5%, 999 SNPs)."
To be in linkage disequilibrium means that the variant is typically inherited together with a DHS site. Suppose the true causal variant is at locus A, but you haven't typed locus A, you've typed locus B, and A and B are inherited together. Then B is going to light up as strong signal in your statistical analysis. So, what Maurano et al. are saying in the above paragraph is that the non-coding SNPs either turned up in a DHS site, or they found evidence that they were strongly correlated with one of such sites.
"Many common disorders have been linked with early gestational exposures or environmental insults. Because of the known role of the chromatin accessibility landscape in mediating responses to cellular exposures such as hormones, we examined if DHSs harboring GWAS variants were active during fetal developmental stages. Of 2931 noncoding disease- and trait-associated SNPs within DHSs globally, 88.1% (2583) lie within DHSs active in fetal cells and tissues. Of DHSs containing disease-associated variation, 57.8% are first detected in fetal cells and tissues and persist in adult cells (“fetal origin” DHSs), whereas 30.3% are fetal stage–specific DHSs.
And finally:
"Enhancers may lie at great distances from the gene(s) they control and function through long-range regulatory interactions, complicating the identification of target genes of regulatory GWAS variants."
GWAS variants control distant genes that need not even be on the same chromosome. Furthermore, these variants in DHSs sites tend to alter allelic chromatin state, thus modulating the accessibility of genes to transcription factors. Disease-linked variants were found to alter such accessibility, resulting in allelic imbalance (one allele gets transcribed more than the other one), possibly explaining their role in altering the disease risk or quantitative trait.

[1] Matthew T. Maurano, Richard Humbert, Eric Rynes, Robert E. Thurman, Eric Haugen, Hao Wang, Alex P. Reynolds, Richard Sandstrom, Hongzhu Qu, Jennifer Brody, Anthony Shafer, Fidencio Neri, Kristen Lee, Tanya Kutyavin, & Sandra Stehling-Sun (2012). Systematic Localization of Common Disease-Associated Variation in Regulatory DNA Science DOI: 10.1126/science.1222794

No comments:

Post a Comment

Comments are moderated. Comments with spam links will be deleted and never published. So, if your intention is to leave a comment just to post a bogus link, please spare your time and mine. To all others: thank you for leaving a comment, I will respond as soon as possible.