Debunking myths on genetics and DNA

Showing posts with label SNP associations. Show all posts
Showing posts with label SNP associations. Show all posts

Monday, September 24, 2012

ENCODE sheds light on non-coding variants


Back when I started studying human genetics, we were still doing single-gene associations. Namely, we would type a bunch of variants in a single gene and then do a case-control association study to see which, if any, of those variants marked an increase in disease risk. That's how breast cancer markers such as BRCA1 and BRCA2 have been found.

When the Human Genome Project was completed in 2003, scientists started looking for disease risk alleles across the whole genome. The findings were puzzling: more than 90% of the diseases-associated variants fell in non-coding regions. Why? One issue I've previously discussed is that when looking at tens of thousands of loci, you need huge sample sizes and often, when huge sample sizes aren't feasible, these studies are underpowered. Another possible explanation lies in epistasis, and the detected signal may be the effect of some unknown correlation.

However. You knew there was going to be a "however", right? Because thanks to the ENCODE project we now know that if a genetic variant falls in a non-coding region, it doesn't mean it has no effect whatsoever. ENCODE is bound to shed new light on these numerous non-coding risk alleles that genome-wide association studies (GWAS) studies have found.

Last time I discussed DHSs, or DNase I hypersensitive sites. These are chromatin regions where many regulatory elements have been found. In [1], Maurano et al. show that many of the non-coding variants associated with common diseases are concentrated in regulatory DNA marked by DHSs. The researchers performed genome-wide DNase I mapping across 349 cell and tissue types. As discussed last week, regions of DNase I accessibility harbor regulatory elements. The researchers also examined the distribution of 5654 non-coding SNPs (single base variants) that had been significantly associated to some disease or trait in genome-wide studies.

These the main findings:
"Fully 76.6% of all noncoding GWAS SNPs either lie within a DHS (57.1%, 2931 SNPs) or are in complete linkage disequilibrium (LD) with SNPs in a near-by DHS (19.5%, 999 SNPs)."
To be in linkage disequilibrium means that the variant is typically inherited together with a DHS site. Suppose the true causal variant is at locus A, but you haven't typed locus A, you've typed locus B, and A and B are inherited together. Then B is going to light up as strong signal in your statistical analysis. So, what Maurano et al. are saying in the above paragraph is that the non-coding SNPs either turned up in a DHS site, or they found evidence that they were strongly correlated with one of such sites.
"Many common disorders have been linked with early gestational exposures or environmental insults. Because of the known role of the chromatin accessibility landscape in mediating responses to cellular exposures such as hormones, we examined if DHSs harboring GWAS variants were active during fetal developmental stages. Of 2931 noncoding disease- and trait-associated SNPs within DHSs globally, 88.1% (2583) lie within DHSs active in fetal cells and tissues. Of DHSs containing disease-associated variation, 57.8% are first detected in fetal cells and tissues and persist in adult cells (“fetal origin” DHSs), whereas 30.3% are fetal stage–specific DHSs.
And finally:
"Enhancers may lie at great distances from the gene(s) they control and function through long-range regulatory interactions, complicating the identification of target genes of regulatory GWAS variants."
GWAS variants control distant genes that need not even be on the same chromosome. Furthermore, these variants in DHSs sites tend to alter allelic chromatin state, thus modulating the accessibility of genes to transcription factors. Disease-linked variants were found to alter such accessibility, resulting in allelic imbalance (one allele gets transcribed more than the other one), possibly explaining their role in altering the disease risk or quantitative trait.

[1] Matthew T. Maurano, Richard Humbert, Eric Rynes, Robert E. Thurman, Eric Haugen, Hao Wang, Alex P. Reynolds, Richard Sandstrom, Hongzhu Qu, Jennifer Brody, Anthony Shafer, Fidencio Neri, Kristen Lee, Tanya Kutyavin, & Sandra Stehling-Sun (2012). Systematic Localization of Common Disease-Associated Variation in Regulatory DNA Science DOI: 10.1126/science.1222794

ResearchBlogging.org






Monday, September 10, 2012

The encyclopedia of DNA - Part I


The raw numbers of the human genome: three billion base pairs, of which roughly 1% fall into the 20,000 genes in our genome. So, what's all the extra stuff for?

Typing the whole human genome, in 2001, was only the beginning. The next step in disentangling the puzzle was to assign biochemical functions to those three billion base pairs.
"The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions" [1].
Let's start with a bit of a refresher.

Regulatory regions: these are regions in the genome that regulate gene transcription. Thanks to these regulatory sequences, skin cells only express "skin" genes, brain cells express "brain" genes, and so on. Promoters, for example, are regulatory sequences found immediately before the start of the gene, on the same strand, and they initiate the transcription of the gene. There are other regions, called enhancer, which also promote transcription. However, contrary to promoters, enhancers need not be near the gene. They don't even need to be on the same chromosome, and some enhancers have been found in introns, regions of a gene that are removed prior to making mRNA.

Transcription factors: I talked a little bit about them last week. These are proteins that can either promote or block the recruitment of RNA polymerase, and therefore either activate or silence a gene.

And, finally you can review the concepts of chromatin structure and histone modification in a few previous posts.

All these concepts are useful to understand that there's a lot, and I mean A LOT going on, between genes and phenotype. Genes are only the starting point. You can't just look at genes alone in order to try and infer a phenotype.

Started in 2003, the aim of ENCODE was to annotate all functional regions of the genome, where by "functional" they don't just mean encoding proteins, but also presenting some biochemical signature such as protein binding or a specific chromatin structure. The latest findings published in Nature: over 700,000 promoter regions and nearly 400,000 enhancer regions that regulate gene expression.

You can see the complications and layers to this: while we have one unique genome, which is identical in all nucleated cells, once you start looking for function, you have to look at the whole genome and chromatin structure and RNA transcripts of all cell lines, as each cell line will have its own activated and silenced genes, its own chromatin signatures, and so on ... whew, that's A LOT!

So far the ENCODE Project Consortium has integrated the data from 1,640 experiments involving 147 different cell types. They saw that
"The vast majority (80.4%) of the human genome participates in at least one biochemical RNA- and/or chromatin-associated event in at least one cell type."
Many more cell lines are yet to be explored, and yet these initial results already shed light into puzzling questions, like, for example: why do nearly 90% of SNPs found in whole genome disease association studies fall outside genes?
"Single nucleotide polymorphisms (SNPs) associated with disease by GWAS are enriched within non-coding functional elements, with a majority residing in or near ENCODE-defined regions that are out- side of protein-coding genes. In many cases, the disease phenotypes can be associated with a specific cell type or transcription factor."
I can't tell you how excited I am about these results, as I started blogging a little over one year ago raising exactly the point that junk DNA should NOT be called junk DNA.

I'm coming down with the flu (how do you explain to your kids NOT to cough in your face when they have a bug? Sigh), so this will be all for this time. But I've got all the Nature papers printed out and will be talking more about them in the next few weeks. A lot of new (and exciting) stuff to learn!

[1] The ENCODE Project Consortium (2012). An integrated encyclopedia of DNA elements in the human genome Nature DOI: 10.1038/nature11247

ResearchBlogging.org

Monday, July 30, 2012

Oedipus's dilemma


I love Greek mythology, and of all myths, Oedipus is probably the one that fascinates me the most. Nothing to do with the fact that it's become a psychiatric hallmark. I love this myth because it always makes me wonder: if somebody came to you and told you they knew with absolute certainty your future (how many years you'll live, what you'll accomplish, etc.), would you want to know? It's a paradox, because that knowledge would affect the future course of action you choose. Think about Laius: he fulfilled his destiny exactly because of the actions he took in order to avoid his destiny. Predestination paradoxes have been used forever in all mythologies, and even these days -- can you think of at least a novel or a movie where it's been used?

I'm rambling, but I actually have a point for this post, I promise.

As you know, nobody's going to come and offer to tell you your exact destiny. But, they might offer to type your entire genome. And from that, they may argue they can tell you the exact risk you have of developing certain diseases. In fact, some of you may already have opted to have their entire genome typed. Such services have become more affordable, accurate, and efficient in just a handful of years. The benefits are numerous: drug therapy could be genetically targeted, and just by looking at your DNA your doctor could already know which drugs will be more effective and which could instead have adverse effects. Assessing one's risk for cancer, diabetes, or other diseases can be a good motivator to a healthier lifestyle and open up preventive treatment choices.

So, where's the catch?

The catch is that, as a new study on Science Translational Medicine shows [1], sequencing the entire genome doesn't tell us the whole story. In fact, in many cases, it doesn't tell us much at all.

Roberts et al. argue that the risk we need to be able to assess should be pretty strong in order to make preventive measures effective. For example, currently the general population risk of developing breast cancer within a woman's lifetime is 12%, obviously too low for women to opt for a preventive mastectomy. However, if a woman learned that her risk was 90%, she might reconsider. Any preventive measure carries consequences, and therefore, the risk reduction it ensures should be pretty strong in order to establish clinical utility.

After setting a meaningful risk threshold, Roberts et al. collected genetic data from numerous homozygous twin registries and cohorts. (Little pet peeve of mine: couldn't find the exact number of pairs they had in the study, it's probably in the supplemental material, but I find sample size important enough to expect it in the main text). They then developed a mathematical model to estimate the maximum capacity of whole-genome sequencing to predict the risk for 24 common diseases, including autoimmune diseases, cancer, cardiovascular diseases, genito-urinary diseases, neurological diseases, and obesity-associated diseases. The idea behind the mathematical model is to assess the risk increment of an individual with a disease-associated genotype compared to someone with no genetic risk at all. Since homozygous twins have nearly identical genomes, you would expect their genetic risks to have a nearly identical outcome.
"The general public does not appear to be aware that, despite their very similar height and appearance, monozygotic twins in general do not always develop or die from the same maladies. This basic observation, that monozygotic twins of a pair are not always afflicted by the same maladies, combined with extensive epidemiologic studies of twins and statistical modeling, allows us to estimate upper and lower bounds of the predictive value of whole-genome sequencing."
Using their model, the researchers showed that most individuals would show a risk predisposition to at least one of the 24 diseases tested. At the same time, they would test negative for most diseases. What does this mean? It means that we cannot predict the risk allele distribution of the actual population, and most often genetic testing will only say that individual X has the same risk of developing disease Y as the general population -- hardly enough to make whole genome testing surpass the clinical utility threshold.
"Thus, our results suggest that genetic testing, at its best, will not be the dominant determinant of patient care and will not be a substitute for preventative medicine strategies incorporating routine checkups and risk management based on the history, physical status, and life-style of the patient."

[1] Nicholas J. Roberts, Joshua T. Vogelstein, Giovanni Parmigiani, Kenneth W. Kinzler, Bert Vogelstein1 and, & Victor E. Velculescu (2012). The Predictive Capacity of Personal Genome Sequencing Sci Transl Med 4, 133ra58 DOI: 10.1126/scitranslmed.3003380

ResearchBlogging.org



Monday, June 11, 2012

Do rare variants hold the missing answers?


Most DNA is identical across subjects. However, some genes are polymorphic, which means different alleles of the same gene are present across individuals. Since we all have two copies of each gene, individuals who carry two identical copies are called homozygous, and those who carry different copies are called heterozygous. Typically, one allele is most common in the population, the "wild type," and the other ones, present at lower frequencies, are called "mutants." Single-base differences are called single nucleotide polymorphism, or SNP (pronounced "snip"), and, on average, they occur about every thousand base pairs.

For the past 20 years, genetic research has focused on finding associations between SNPs and major diseases like cancer, Alzheimer, diabetes, etc. Back when I was doing this type of research, from 2004 until 2006, we used to exclude SNPs whose minor allele frequency (MAF) was lower than 0.5% in a given ethnic group. The logic was that it was too rare to make any significant contribution. Back then we were sampling a few hundred people and we simply didn't have enough statistical power to detect an effect when the frequency was that low.

A note from the statistician: SNP association studies ask the question, "Does mutant allele X raise the risk to develop disease Y"? As it happens with all statistical tests, the answer comes with a p-value, and the p-value represents the probability of observing the given data distribution by chance. P-values of 0.05 or lower are "good" because they mean that the chance of the association not being real but simply due to chance is low (less than 5%). On the other hand, we could make the opposite mistake: we could have missed something real. A measure of the probability of not missing a true association is given by the "power" of the test. In general, the larger the dataset, the higher the power of the test; however, the smaller the effect one is trying to detect, the lower the power. Therefore, if a rare variant does affect the risk of a certain disease, a very large dataset is needed in order to have enough power to detect the association.

In less than ten years sequencing technology has improved steadily and genotyping costs have decreased, allowing researchers to genotype many more people. Furthermore, though SNP association studies have been very informative, they still haven't answered the question of the missing heritability: a large portion of hereditary traits (including diseases) are not explained by known associations.

Bottom line: this has shifted the interest back to the "rare" variants, SNPs whose MAF is less than 0.5%.
"Rare and low frequency (MAF between 0.5%-1%) variants have been hypothesized to explain a substantial fraction of the heritability of common, complex diseases. [...] Common variants explain only a modest fraction of the heritability of most traits [1]."
Tennessen et al. sequenced 15,585 human protein-coding genes from over 2,000 individuals of either European or African ancestry, and identified more than 500,000 single nucleotide variants, 86% of which were rare.
"This excess of rare functional variants is due to the combined effects of explosive, recent accelerated population growth and weak purifying selection. Furthermore, we show that large sample sizes will be required to associate rare variants with complex traits."
In the last few thousand years populations have experienced a rapid growth that had likely gone undetected in previous studies due to small sample sizes. Most rare variants (58%) found by Tennessen et al. were population specific and nonsynonimous, meaning that they yielded different amino acids. Surprisingly, this study found that "the vast majority of protein-coding variation is evolutionarily recent, rare, and enriched for deleterious alleles. Thus, rare variation likely makes an important contribution to human phenotypic variation and disease susceptibility."

In the next couple of years we will see more and more studies looking at associations between rare variants and diseases using 454 and deep sequencing technology. Many more rare variants will be discovered and the question will be to find the meaningful ones that rise above the background noise.

[1] Tennessen, J., Bigham, A., O'Connor, T., Fu, W., Kenny, E., Gravel, S., McGee, S., Do, R., Liu, X., Jun, G., Kang, H., Jordan, D., Leal, S., Gabriel, S., Rieder, M., Abecasis, G., Altshuler, D., Nickerson, D., Boerwinkle, E., Sunyaev, S., Bustamante, C., Bamshad, M., Akey, J., , ., , ., & , . (2012). Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes Science DOI: 10.1126/science.1219240

ResearchBlogging.org



Thursday, February 2, 2012

Missing heritability: the humble opinion of a mathematician


Tomorrow, February 3, is Eric Lander's birthday, the director of the Broad Institute (the well-known MIT/Harvard genomic research center), and the first author of the historic 2001 Nature paper that marked the completion of the Human Genome Project [1]. I heard him once speak at USC and without ever getting technical he managed to engage the whole audience and share his passion for genetics. As you know, I've been honoring famous geneticists by discussing one of their papers on their birthday and today I'm facing a conundrum. You see, the natural choice would be to pick the latest PNAS paper titled "The mystery of genetic heritability" [2]. I want to talk about this paper and at the same time I don't want to talk about this paper.

I'm not a geneticist. I'm a computational biologist, which means my background is mostly analytical, not biological. I used to work on SNP associations and cancer epidemiology and now I work on HIV. I am NOT one of the players in this game. Hence, what does my opinion count when it comes to a highly debated paper as this one?

The thing is, this paper resonates with me. It makes a great point about a mathematical model that's been "assumed" for years now in the world of genetics. Often people don't get mathematical models. They don't get that mathematical models are tools, not the truth. Hence when one says "I present this model," you get two possible reactions: those who have seen data concordant with your model will smile and happily welcome your model. Those who instead have seen the opposite will boo you and challenge you. Problem is, models are neither right or wrong. Models are tools. Do they help describe what we see? Fine, we keep the model. When they don't, we go back to the data and try to understand which of our assumptions failed. We use the model to discern the situations that meet the assumptions stated in the model from those that don't. Models help us shape our thinking, not the data! For example, evolution is a model, too. Go tell that to creationists and followers of intelligent design. They can challenge evolution as much as they want, but until they hand me a model that explains the genetic diversity we observe today better than evolution does, I will stick with evolution.

Back to the PNAS paper. It's a hot topic right now, and I'm kind of late discussing this particular paper in the blogosphere. Razib Khan discussed it here, Luke Jostins here and here, and I'm sure many others whom I don't know have talked about it too.

So, what is the missing heritability? Since I've already defined it in an earlier post of mine, for the time being, let me just quote Razib Khan:
"The issue is basically that there are traits where patterns of inheritance within the population strongly imply that most of the variation is due to genes, but attempts to ascertain which specific genetic variants are responsible for this variation have failed to yield much. For example, with height you have a trait which is ~80-90 percent heritable in Western populations, which means that the substantial majority of the population wide variation is attributable to genes. But geneticists feel very lucky if they detect a variant which can account for 1 percent of the variance."
The implications of this are clear: we want to find risk alleles to predict common diseases, but given the missing heritability, we can't predict common diseases.

Is this surprising?

Given the reactions I saw on the internet, apparently it is. People claim we still haven't found all variants and that's where the missing heritability's hiding. Maybe. However, after reading so much about epigenetics, RNA editing, and epistasis, allow me to be skeptical. Traits (proteins, diseases, etc.) are not genes. The path from genes to traits is long and convoluted.

So, what's Lander's point in this PNAS paper? Something I've also previously discussed: epistasis, or the way genes interact together. We're missing heritability because we think of risks as additive, but additivity doesn't count for interactions. If you take into account interactions between genes, the total heritability is much smaller than anticipated and hence the percentage of what the variants are explaining (all together) much larger.
"Quantitative geneticists have long known that genetic interactions can affect heritability calculations. However, human genetic studies of missing heritability have paid little attention to the potential impact of genetic interactions."
Now here's the beauty of this paper. They do not deny the additive risk model. They extend it:
"We thus introduce the limiting pathway (LP) model, in which a trait depends on the rate-limiting value of k inputs, each of which is a strictly additive trait that depends on a set of variants (that may be common or rare). When k = 1, the LP model is simply a standard additive trait. For k > 1, we show that LP(k) traits can have substantial phantom heritability."
Again, mathematician thinking here, but that's exactly what models are for: some traits may very well be additive. However, the model does not fit all the data we observe it. Hence we need a better model, one that encompasses the old one and at the same time goes beyond it. Gene-gene interactions need not explain all missing heritability. But since they've been observed, we need to account for them in those situations where they may be real.
"The potential magnitude of phantom heritability can be illustrated by considering Crohn's disease, for which GWAS have so far identified 71 risk associated loci (13). Under the usual assumption that the disease arises from a strictly additive genetic architecture, these loci explain only 21.5% of the estimated heritability. However, if Crohn's disease instead follows an LP(3) model, the phantom heritability is 62.8%, thus genetic interactions could account for 80% of the currently missing heritability."
"In short, genetic interactions may greatly inflate the apparent heritability without being readily detectable by standard methods. Thus, current estimates of missing heritability are not meaningful, because they ignore genetic interactions."
"The results show that mistakenly assuming that a trait is additive can seriously distort inferences about missing heritability. From a biological standpoint, there is no a priori reason to expect that traits should be additive. Biology is filled with nonlinearity: The saturation of enzymes with substrate concentration and receptors with ligand concentration yields sigmoid response curves; cooperative binding of proteins gives rise to sharp transitions; the outputs of pathways are constrained by rate-limiting inputs; and genetic networks exhibit bistable states."
Mother Nature did not create mathematics. We created mathematics to describe Mother Nature. We start with a simple model and build up on it. The data is always the reality check, we should never forget that.

[1] Lander, E., Linton, L., Birren, B., Nusbaum, C., Zody, M., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., Funke, R., Gage, D., Harris, K., et al. (2001). Initial sequencing and analysis of the human genome Nature, 409 (6822), 860-921 DOI: 10.1038/35057062

[2] Zuk, O., Hechter, E., Sunyaev, S., & Lander, E. (2012). The mystery of missing heritability: Genetic interactions create phantom heritability Proceedings of the National Academy of Sciences, 109 (4), 1193-1198 DOI: 10.1073/pnas.1119675109

ResearchBlogging.org

Monday, January 30, 2012

The wondrous mitochondrion and its proteome


The sequencing of human mitochondrial DNA, a circular DNA molecule contained in mitochondria, was completed in 1981, and, since then, roughly 150 mutations have been found that are associated with maternally inherited diseases (if you don't remember why mtDNA is inherited from the mother and not the father, check out this earlier post of mine). Despite this, the majority of human mitochondrial syndromes are actually caused by defects in the nuclear genome. This is sort of obvious if you think about it, given that human mitochondrial proteome consists of an estimated 1,100-1,400 distinct proteins, of which 13 are encoded by the mitochondrial DNA. The majority of proteins targeted at the mitochondria are actually encoded by nuclear genes. In fact
"The 13 proteins encoded by mammalian mtDNA are all components of the respiratory chain, which generates the majority of cellular ATP via oxidative phosphorylation (OXPHOS). However, the remaining respiratory chain subunits are encoded by nuclear genes, as are all proteins required for the transcription, translation, modification, and assembly of the 13 mtDNA proteins. All the components of numerous other mitochondrial pathways are also nuclear en- coded, including the tricarboxylic acid (TCA) cycle, protein import, fatty acid and amino acid oxidation, apoptosis, and biosynthesis of ketone bodies, pyrimidines, heme, and urea. Furthermore, during the decades following the sequencing of the mtDNA, it became clear that maternally inherited mitochondrial disorders represent only 20% of all inherited human mitochondrial disorders [1]."
Mitochondria are amazing organelles. Roughly half of mitochondrial proteins are ubiquitous and found across all organs, while the rest are tissue specific, meaning that their function and structure varies across cell lines. For example, when comparing mitochondria across different tissues, researchers found about a 75% overlap. In addition to this cell-type specificity, some of the mitochondrial proteins are expressed at very low levels or only during certain specific developmental stages, making the characterization of the mitochondrial proteome a challenging task. Today, a little over 1,000 of all mitochondrial proteins have been identified, mainly through large-scale proteomics, microscopy, and computation.

The first half of Calvo and Mootha's review [1] is an detailed report on the progress made so far in extensively classifying the mitochondrial proteome. They then proceed to discuss how the inventory of mitochondrial proteins has lead to the better understanding of mitochondrial disorders as well as the discovery of new disease genes.
"Traditionally, mitochondrial disease has referred primarily to disorders of oxidative ATP production, as discussed above. However the breadth of the mitochondrial proteome now implicates a large number of additional phenotypes, such as soft tissue tumors (paragangliomas) and diabetes mellitus. The discovery of new disease genes will further expand the clinical phenotypes associated with mitochondrial defects."

[1] Calvo, S., & Mootha, V. (2010). The Mitochondrial Proteome and Human Disease Annual Review of Genomics and Human Genetics, 11 (1), 25-44 DOI: 10.1146/annurev-genom-082509-141720

ResearchBlogging.org

Monday, January 9, 2012

Chaperon proteins do more than... chaperon


The full human genome was typed for the first time in 2003. Ever since, there has been a "hunt" for mutations and, more in general, associations between genotypes and phenotypes. As I have pointed out multiple times on this blog, things have turned out more complicated than originally anticipated: what happens between DNA and proteins (what we could consider the "end" product) is still very much a "black box" in which epigenetic changes and RNA editing can completely turn around the outcome. Furthermore, the interaction between genes and mutated loci can either increase or decrease the likelihood of certain phenotypes, given the genotype.

Take molecular chaperones, for example. These are proteins that assist the folding and unfolding of other macromolecules. They are typically involved in protein folding, but they also assist the assembly of nucleosomes from folded histones and DNA in the nucleus (see this earlier post on chromatin) and thus, by changing the topology of the nucleus, they play an important role in regulating gene expression.

A study published in the last issue of Science [1] looks at the role of chaperon proteins in compensating for deleterious mutations in Caenorhabditis elegans. Casanueva et al. found that worms with higher expression of protective chaperon genes were more resistant to deleterious mutations: worms with a potentially deadly mutation received a mild heat stress when still larvae. The heat stress promoted the expression of protective chaperon genes, and in some of the worms this prevented the deleterious misfolding of proteins, resulting in a 35% increase in chance of survival.
"We subjected animals to a transient heat shock as larvae to induce a stress response, allowed them to develop to adults, and examined the proportion of individuals affected by late-acting mutations. When a mutation was chaperone-dependent, a mild environmental challenge stimulated a reduction in penetrance."
Paradoxically, they also found that individuals with higher chaperon expression reproduced less. Why, if the higher expression seems advantageous and protective? Casanueva et al. hypothesize that the net effect is to maintain a heterogeneous population in levels of expression, and this is more advantageous to the survival of the population than homogeneous levels of gene expression. In other words, what is advantageous to the individual is not necessarily advantageous to the species.

From the paper abstract:
"The induced mutation buffering varies across isogenic individuals because of interindividual differences in stress signaling. This variation has important consequences in wild-type animals, producing some individuals with higher stress resistance but lower reproductive fitness and other individuals with lower stress resistance and higher reproductive fitness. This may be beneficial in an unpredictable environment, acting as a “bet-hedging” strategy to diversify risk. These results illustrate how transient environmental stimuli can induce protection against mutations, how environmental responses can underlie variable mutation buffering, and how a fitness trade-off may make variation in stress signaling advantageous."

Of course, it's not clear how this could apply to humans. However, it does prompt caution when treating a person's full genome as a key to disease risks. We are still far from unraveling the complete interactions between genome, epigenome, and proteome, and, as I've often said before, Mother Nature has made us far more complex than any of our models can predict.

[1] Casanueva, M., Burga, A., & Lehner, B. (2011). Fitness Trade-Offs and Environmentally Induced Mutation Buffering in Isogenic C. elegans Science, 335 (6064), 82-85 DOI: 10.1126/science.1213491

Photo: it's not what it looks like! This is eggs, water and vegetable oil all mixed in a blue bowl to make brownies. Seriously. You just set the bowl under a lamp and suddenly it behaves like a mirror. Eventually the mix turned into brownies, but not before my daughter and I had a little fun shooting pictures.

ResearchBlogging.org

Sunday, December 18, 2011

Enough with OXTR associations. Here's what I really want to know.


EDIT: After reading the post, please check out the comments. Luke, from Genomes Unzipped, helped me understand the matter better, so don't miss his comment!

Another OXTR paper came out in PNAS, the third since September. OXTR is the gene coding the oxytocin receptor. Given the benefits of oxytocin (dubbed the "love hormone"), people have focused on studying this gene and, in particular, possible associations between a common OXTR polymorphism, rs53576, and various behaviors:
"One SNP in the third intron of OXTR has emerged as a particularly promising candidate in recent studies on human social behavior: rs53576 (G/A). In recent studies, the A allele of rs53576 has been associated with reduced maternal sensitivity to child behavior, lower empathy, reduced reward dependence, lower optimism and self-esteem, and, in men, negative affect. Moreover, the A allele has also been associated with a larger startle response and reduced amygdala activation during emotional face processing. Associations have also been reported between other variants of OXTR and amygdala volume, risk for autism, the quality of infants‚ attachment bonds with their caregivers, attachment anxiety in adult females, and autistic-like social difficulties in adult males [1]."
This study in particular [1] recruited 194 individuals and found an association between the SNP in question and the way the participants reacted to positive feedback during stressful situations. They did this by measuring cortisol responses to stress based on the fact that psychosocial stress increase the levels of salivary cortisol. In AA carriers they found that these levels remained unchanged whether they received the support or not. The researchers conclude:
"Physiologically, it can be speculated that oxytocin released in the context of social support influences stress processing systems via oxytocin receptors in hypothalamic‚ limbic circuits. One likely important site of action is the amygdala, critically involved in basic emotional processing and the regulation of complex social behavior."
I confess I've been eagerly following these OXTR studies and indeed they make a great story. There's a part, though, that puzzles me, and the reason why I'm discussing this paper today is to ask a general question. If you're an expert on these things I welcome your input.

I understand these are important studies because, despite some recent criticism, they are still getting published, and PNAS, as we all know, is one of the top science journals out there. However, the thing I don't understand is that rs53576 is a silent SNP. That's actually not surprising, because, as it turns out, most common polymorphisms are silent. What is surprising, though, is that most silent SNPs are non functional, and none of these studies I've read seems to raise the question. Let me explain.

Rs53576 sits in an intron, a part of the gene that is not transcribed into RNA and hence, in this case, does not affect the way the oxytocin receptor is made. In the analogous studies we do in my group, which are NOT on humans, we look for non-silent mutations because those are the ones that affect the crystal structure of the protein. We then look at what differences in structure these mutations yield to explain how more or less molecules bind to the protein, and this how we explain the observed effects. If rs53576 were a non-silent mutation, I'd know where to look to explain these associations: I'd look at how the SNP affects the crystal structure of the receptor, the hypothesis being that the oxytocin receptor in AA carriers binds less oxytocin than GG carriers (or something along those lines, I obviously don't know the details of this particular receptor). But rs53576 is silent. Hence, if the associations are real, there is something else going on. So, why hasn't anybody raised the question of what else is going on here?

The first thing that comes to mind is that this particular SNP could be in linkage disequilibrium with some other SNP or groups of SNPs which, instead, are non-silent. We tend to inherit polymorphisms in groups, and so if rs53576 comes in the same "package" (they're called haplotype blocks) as some other functional SNP, then rs53576 is NOT the causal SNP for all these effects and we should really be looking elsewhere. The way to find out, of course, is to repeat all these studies with whole genome data. But, it could also be an epigenetic change or a post-transcriptional modification occurring between the primary transcript RNA (which contains both introns and exons) and the mature messenger RNA (which then yields to the protein). The positions of introns can indeed affect the translational properties of the RNA, and that's what yields to the so-called "functional intronic SNPs." The fact that intronic polymorphisms can be functional is extremely interesting, and in fact, last year, this study showed that one particular SNP found in one intron of GH1, the growth hormone, could indeed be functional.

Whatever it is, at this point, isn't it more interesting to investigate what's going on with this SNP at the molecular level rather than looking at all these association studies which may or may not be true?

[1] Chen, F., Kumsta, R., von Dawans, B., Monakhov, M., Ebstein, R., & Heinrichs, M. (2011). Common oxytocin receptor gene (OXTR) polymorphism and social support interact to reduce stress in humans Proceedings of the National Academy of Sciences, 108 (50), 19937-19942 DOI: 10.1073/pnas.1113079108

Photo: Fall colors along the Rio Grande. Shutter speed 1/40, F-stop 5.6, ISO speed 100, and focal length 85mm.

ResearchBlogging.org

Thursday, December 1, 2011

Genetic epistasis


A while ago, in a post titled the Missing Heritability, I discussed the fact that some risk alleles (gene copies that have been found to increase the risk for a certain disease) may turn out to be counter-effected by other genes and thus explain why some people with these alleles never develop the particular disease. At the time I did a quick search on PubMed but couldn't come up with anything in the literature. Well, I was missing the keyword: epistasis. The word comes from the Greek "epi", which means "upon," and "stasis," which means to stop (I see my mom gloating out there in the audience!): compositional epistasis is the mechanism by which the effect of one allele is modified, and in some cases even blocked, by other gene alleles. This of course is hard to detect, but intuition tells us that it is a rather diffuse phenomenon. Genes are far from being "push-buttons," rather, they work in concert, initiating complex pathways, and therefore more often than not, a single gene is unlikely to give us a complete picture.

Back to my quest. I searched "genetic epistasis" on PubMed and this time I found a lot of interesting stuff. As a disclaimer I should say that for some of these studies there are contrasting outcomes in the literature (some results weren't reproduced in different populations). Nonetheless, I think that we are just starting to scrape the tip of the iceberg: gene-to-gene interaction are complex and poorly understood, but they certainly hold the key to the mysterious ways in which our genome works. Despite the skepticism expressed by some in the field, I do believe that single-gene studies are limited and should eventually give way to whole-genome studies.

[1] Evidence of biologic epistasis between BDNF and SLC6A4 and implications for depressionEpistasis of BDNF and SLC6A4 in depression.

SERT is a protein whose function is to terminate and recycle the neurotransmitter serotonin. Historically, serotonin has been associated to happiness and well-being, which explains why SERT is the target of numerous drugs addressing psychiatric disorders. SLC6A4, the gene encoding SERT, has been extensively studied and one polymorphism in particular, 5-HTTLPR (which is not a SNP, a single-base mutation, rather some individuals present a long allele with a 44 base-pair insertion, compared to the short allele) has been associated to the efficacy of some antidepressants and also to other psychiatric disorders. On the other hand, the brain-derived neurotrophic factor (BDNF) protein is involved in the growth, proliferation, and differentiation of certain neurons. The gene encoding BDNF has been associated to bipolar disorder and improved general cognitive ability. Two genes, two (apparently) distinct pathways and signaling systems. Using anatomical neuroimaging techniques in a sample of healthy subjects (n=111), Pezawas et al. showed
"that the BDNF MET allele, which is predicted to have reduced responsivity to 5-HT signaling, protects against 5-HTTLPR S allele-induced effects on a brain circuitry encompassing the amygdala and the subgenual portion of the anterior cingulate (rAC). Our analyses revealed no effect of the 5-HTTLPR S allele on rAC volume in the presence of BDNF MET alleles, whereas a significant volume reduction (P<0.001) was seen on BDNF VAL/VAL background. [...] These data provide in vivo evidence of biologic epistasis between SLC6A4 and BDNF in the human brain by identifying a neural mechanism linking serotonergic and neurotrophic signaling on the neural systems level, and have implications for personalized treatment planning in depression."

[2] Renin-angiotensin system gene polymorphisms and coronary artery disease in a large angiographic cohort: detection of high order gene-gene interaction.

Tsai et al. [2] recruited 1254 patients who underwent cardiac catheterization (735 with documented coronary artery disease and 519 without) and individually matched them with controls based on corresponding risk factors for coronary artery disease. The researchers genotyped several polymorphisms: one in the angiotensin-converting enzyme gene, six in the angiotensinogen gene, and one in the angiotensin II type I receptor gene. In single-locus analyses, no locus was associated with coronary artery disease or acute myocardial infarction. However:
"Significant three-locus (G-217A, M235T and I/D) gene-gene interactions were detected by multifactor-dimensionality reduction method (highest cross-validation consistency 10.0, lowest prediction error 40.56%, P=0.017) and many even higher order gene-gene interactions by multilocus genotype disequilibrium tests (16 genotype disequilibria exclusively found in the controls, all of which included at least two genes among AGT, ACE and AT1R genes). Our study is the first to demonstrate epistatic, high-order, gene-gene interactions between RAS gene polymorphisms and CAD. These results are compatible with the concept of multilocus and multi-gene effects in complex diseases that would be missed with conventional approaches."

I've added below a few more references on epistasis for those interested in researching the topic further.

Photo: Walt Disney Concert Hall, Los Angeles, CA. Shutter speed 1/15, focal length 24mm, F-stop 22, ISO speed 100.

[1] Pezawas, L., Meyer-Lindenberg, A., Goldman, A., Verchinski, B., Chen, G., Kolachana, B., Egan, M., Mattay, V., Hariri, A., & Weinberger, D. (2008). Evidence of biologic epistasis between BDNF and SLC6A4 and implications for depression Molecular Psychiatry, 13 (7), 709-716 DOI: 10.1038/mp.2008.32

[2] Tsai CT, Hwang JJ, Ritchie MD, Moore JH, Chiang FT, Lai LP, Hsu KL, Tseng CD, Lin JL, & Tseng YZ (2007). Renin-angiotensin system gene polymorphisms and coronary artery disease in a large angiographic cohort: detection of high order gene-gene interaction. Atherosclerosis, 195 (1), 172-80 PMID: 17118372

[3] Wiltshire S, Bell JT, Groves CJ, Dina C, Hattersley AT, Frayling TM, Walker M, Hitman GA, Vaxillaire M, Farrall M, Froguel P, & McCarthy MI (2006). Epistasis between type 2 diabetes susceptibility Loci on chromosomes 1q21-25 and 10q23-26 in northern Europeans. Annals of human genetics, 70 (Pt 6), 726-37 PMID: 17044847

[4] Abou Jamra R, Fuerst R, Kaneva R, Orozco Diaz G, Rivas F, Mayoral F, Gay E, Sans S, Gonzalez MJ, Gil S, Cabaleiro F, Del Rio F, Perez F, Haro J, Auburger G, Milanova V, Kostov C, Chorbov V, Stoyanova V, Nikolova-Hill A, Onchev G, Kremensky I, Jablensky A, Schulze TG, Propping P, Rietschel M, Nothen MM, Cichon S, Wienker TF, & Schumacher J (2007). The first genomewide interaction and locus-heterogeneity linkage scan in bipolar affective disorder: strong evidence of epistatic effects between loci on chromosomes 2q and 6q. American journal of human genetics, 81 (5), 974-86 PMID: 17924339

[5] Coutinho AM, Sousa I, Martins M, Correia C, Morgadinho T, Bento C, Marques C, Ataíde A, Miguel TS, Moore JH, Oliveira G, & Vicente AM (2007). Evidence for epistasis between SLC6A4 and ITGB3 in autism etiology and in the determination of platelet serotonin levels. Human genetics, 121 (2), 243-56 PMID: 17203304

ResearchBlogging.org

Monday, October 24, 2011

The missing heritability



It's been dubbed the "dark matter of the genome" because… we know it's there and yet we can't find it.

Ever since the completion of the Human Genome Project, the hunt to disease variants has taken up much, if not most, of genetic research. The idea is simple: we take a sample of healthy people (the controls), a matched sample of diseased people (the cases), we type their DNA, stratify by other possible factors (this one depends on the study, but think of things like smoking, age, family history, socio-economic status, etc.), and then look at what variants in the DNA are statistically more prevalent in the cases. If the experimental design is solid, and the statistical analyses are well done, the result should be one or more loci in the genome that increase the risk of developing the disease.

This has been done for numerous cancers (a vastly known example are the two SNPs BRCA1 and BRCA2, which have been found to increase the risk of breast cancer), and also for heart disease, type 2 diabetes, schizophrenia, and other genetic pathologies.

Is this it? All you need to do to find out whether or not you'll develop something nasty in your lifetime is look at your DNA and breathe easily if nothing of the "red flags" are raised?

No.

When you go back and combine the genetic variability of the trait and the environmental factors, you see that all together they explain only a small fraction of the disease's heritability. In other words, for any of these investigated maladies, the vast majority of the inherited cases remain unexplained. Think for example, of twin pairs where only one sibling develops the genetic disease.

First of all, a philosophical note: the above thinking falls within the so-called "gene-centered" view, which assumes a causal relationship between gene copies and phenotype. This may not be the case at all, as what I've learned so far is that genomes have a tendency to be far more complex than we can predict.

Having said that, here are some hypothesis on where the "dark matter" of the genome could hide.

(1) RARE VARIANTS: The causal relationship we're after could be hidden in what we call "rare variants," in other words, gene copies that can only be found in very few individuals. These alleles are so sparse in the population that even if you find a few, you have very little statistical power to detect their effects on the disease risk. This problem is currently being tackled with improved sequencing technology and new statistical methods to allow for these rare variants to be taken into account.

(2) EPIGENETICS: Recent studies have shown that epigenetic changes induced by environmental factors (such as diet, maternal physiology during pregnancy, parental behaviors, etc.) can be inherited across generations [1]. These "transgenerational genetic effects" are not encoded in the DNA itself, but in the way genes are expressed. They have been found in numerous mouse models, and they indicate that when we don't find anything and the disease is there, we may have missed the causal factor simply because we failed to look at the genetics and exposures of the parents and/or grandparents. Interestingly, as Nadeau points in [1], "in the cases that have been studied, the phenotypic consequences of transgenerational effects persist beyond the first generation but with progressively weaker effects." And, "all genetically predisposed progeny are affected regardless of inheritance of the parental gene." Let me stress the significance of this last statement: a transgenerational genetic effect takes place when an individual presents a specific phenotipic trait, even though the genetic change is not present in the individual, but only in the parent. A study recently published in Nature [2], for example, showed that epigenetic changes induced on a first generation of worms in order to elongate their life span were transmitted to the offsprings, too. Another one published in Science showed a similar result in plants [3].

(3) POST-TRANSCRIPTIONAL REGULATION: A recent paper published in Cell [4] looked at an aggressive form of brain tumor called glioblastoma, and found an association between the disease and the way genes in the cancer cells were expressed. In other words, rather than looking at the actual gene copies, they looked at which genes were translated into their subsequent products, and through what processes. Quoting from the abstract, they found:
"~7,000 genes whose transcripts act as miR ‘‘sponges’’ and 148 genes that act through alternative, non-sponge interactions. Biochemical analyses in cell lines confirmed that this network regulates established drivers of tumor initiation and subtype implementation." 
Let's try and understand this. Genes are transcribed into portions of RNA, which are then used to make proteins. However, in any given cell, some genes are expressed and some are not. In other words, genes can be "turned on" or "turned off," and this happens through very complicated processes. One way is to use tiny molecules of RNA (called miRNA or "micro" RNA) that are complementary to the gene RNA. After the gene has been transcribed, the miRNA binds to the complementary strand of RNA, making it double-stranded. Once the RNA is double-stranded it can no longer "produce" a protein, and therefore, the gene it came from is effectively "silenced," or turned off. So, the "miRNA sponges" found in the Cell paper effectively silence a network of genes and have an important role in cancer pathogenesis. This process is not encoded in the genes themselves (and hence it wouldn't be found by simply looking at the different alleles in the population). Rather, it affects the way genes are transcribed.

(4) PROTECTIVE ALLELES: So far the great focus has been on finding risk alleles. But what about protective alleles, or in other words, variants that counter-act the effect of the deleterious ones? I don't mean just alleles that carry a negative risk, but alleles that are proven to interact with the ones that induce a positive risk, and level them out. The existence of such alleles has been hypothesized and studies are under way to test this possibility too. I didn't find anything in the literature yet, but if you are aware of published studies on this, please let me know and I will include them here.

[1] Nadeau JH (2009). Transgenerational genetic effects on phenotypic variation and disease risk. Human molecular genetics, 18 (R2) PMID: 19808797

[2] Greer, E., Maures, T., Ucar, D., Hauswirth, A., Mancini, E., Lim, J., Benayoun, B., Shi, Y., & Brunet, A. (2011). Transgenerational epigenetic inheritance of longevity in Caenorhabditis elegans Nature DOI: 10.1038/nature10572

[3] Schmitz, R., Schultz, M., Lewsey, M., O'Malley, R., Urich, M., Libiger, O., Schork, N., & Ecker, J. (2011). Transgenerational Epigenetic Instability Is a Source of Novel Methylation Variants Science, 334 (6054), 369-373 DOI: 10.1126/science.1212959

[4] Sumazin P, Yang X, Chiu HS, Chung WJ, Iyer A, Llobet-Navas D, Rajbhandari P, Bansal M, Guarnieri P, Silva J, & Califano A (2011). An Extensive MicroRNA-Mediated Network of RNA-RNA Interactions Regulates Established Oncogenic Pathways in Glioblastoma. Cell, 147 (2), 370-81 PMID: 22000015

Photo: what happens when you put the camera on a tripod, leave the shutter open for thirty seconds, and three cars finally drive by. The original had a lamppost, but I edited out the post and left the lamp. You can find the original here.


ResearchBlogging.org


Thursday, September 15, 2011

All you need is love... and the right alleles


It's been called the "love hormone" because studies have shown that it is released during labor and breastfeeding. Children soothed by their mothers produce it, and, apparently, it has a role in easing social interactions. Oxytocin is a hormone secreted by the pituitary gland. It is a neurotransmitter, which basically means that it helps send signals from the brain to the receiving cells.

OXTR is the oxytocin gene receptor, in other words, this gene codes the protein that sits on the surface of the cell waiting to "grab" the oxytocin. So, if oxytocin has such beneficial effects on our behavior, it seems natural to look into this gene and see how it affects us, right?

That's exactly what a study published in this week's issue of PNAS [1] did. The researchers (from UCLA, UCSB, and Ohio State University) found one particular SNP in OXTR to be associated with three psychological traits: optimism, self-esteem, and mastery (the ability of making decisions, of being determined to achieve certain outcomes in life). This is an important finding, since the traits they found to be linked with OXTR are known to be correlated with positive health outcomes and good stress management.

Okay, let's back up a little. What's a SNP?

You and I share most of our DNA. We all do. There are very few loci where DNA differs across people, and SNPs are some of those loci. SNP (pronounced "snip") stands for Single Nucleotide Polymorphism, and it represents one particular base in the DNA that's found to be changing across the population (hence the "polyphormism"). It's a single base, but because we have two copies, it is represented by two nucleotides. The SNP found in the PNAS paper, for example, is represented by the following alleles in the population: AA, AG, and GG. In other words, when you look at people's DNA at that particular position, you'll find that some carry a GG, some an AG, and some others an AA. So how was the association found? The researchers recruited a number of subjects and found out which alleles they carried. Then they measured their psychological traits, and they saw that individuals that carried the "A" allele had a tendency to have lower levels of optimism, self-esteem, and mastery, and higher levels of depression.

Now to the caveats.

In general, looking at one SNP only gives a somehow limited picture. Genetics is not just DNA, rather a very complicated hierarchy of interactions, mechanisms, and cascade effects. Genes often interact and "combine" forces. For example, groups of multiple SNPs tend to be inherited together, and "piggy-back" mutations appear as an effect of chromosomal recombination. In this case in particular, this hypothesis seems plausible given the fact that the SNP under investigation is silent, hence does not affect the structure of the protein OXTR encodes. Furthermore, one must keep in mind that certain traits can be altered by epigenetic changes. Caveats aside, it is certainly fascinating to see how genes can affect our behavior and state of mind, and I look forward to the next papers from this group.

[1] Saphire-Bernstein, S., Way, B., Kim, H., Sherman, D., & Taylor, S. (2011). Oxytocin receptor gene (OXTR) is related to psychological resources Proceedings of the National Academy of Sciences, 108 (37), 15118-15122 DOI: 10.1073/pnas.1113137108

Photo: aspens at sunset. Canon 40D, focal length 81mm, F-stop 5.6, shutter speed 1/100. On a side note, those three aspens came down this summer. Too much wind, sadly.

ResearchBlogging.org