Debunking myths on genetics and DNA

Thursday, February 2, 2012

Missing heritability: the humble opinion of a mathematician

Tomorrow, February 3, is Eric Lander's birthday, the director of the Broad Institute (the well-known MIT/Harvard genomic research center), and the first author of the historic 2001 Nature paper that marked the completion of the Human Genome Project [1]. I heard him once speak at USC and without ever getting technical he managed to engage the whole audience and share his passion for genetics. As you know, I've been honoring famous geneticists by discussing one of their papers on their birthday and today I'm facing a conundrum. You see, the natural choice would be to pick the latest PNAS paper titled "The mystery of genetic heritability" [2]. I want to talk about this paper and at the same time I don't want to talk about this paper.

I'm not a geneticist. I'm a computational biologist, which means my background is mostly analytical, not biological. I used to work on SNP associations and cancer epidemiology and now I work on HIV. I am NOT one of the players in this game. Hence, what does my opinion count when it comes to a highly debated paper as this one?

The thing is, this paper resonates with me. It makes a great point about a mathematical model that's been "assumed" for years now in the world of genetics. Often people don't get mathematical models. They don't get that mathematical models are tools, not the truth. Hence when one says "I present this model," you get two possible reactions: those who have seen data concordant with your model will smile and happily welcome your model. Those who instead have seen the opposite will boo you and challenge you. Problem is, models are neither right or wrong. Models are tools. Do they help describe what we see? Fine, we keep the model. When they don't, we go back to the data and try to understand which of our assumptions failed. We use the model to discern the situations that meet the assumptions stated in the model from those that don't. Models help us shape our thinking, not the data! For example, evolution is a model, too. Go tell that to creationists and followers of intelligent design. They can challenge evolution as much as they want, but until they hand me a model that explains the genetic diversity we observe today better than evolution does, I will stick with evolution.

Back to the PNAS paper. It's a hot topic right now, and I'm kind of late discussing this particular paper in the blogosphere. Razib Khan discussed it here, Luke Jostins here and here, and I'm sure many others whom I don't know have talked about it too.

So, what is the missing heritability? Since I've already defined it in an earlier post of mine, for the time being, let me just quote Razib Khan:
"The issue is basically that there are traits where patterns of inheritance within the population strongly imply that most of the variation is due to genes, but attempts to ascertain which specific genetic variants are responsible for this variation have failed to yield much. For example, with height you have a trait which is ~80-90 percent heritable in Western populations, which means that the substantial majority of the population wide variation is attributable to genes. But geneticists feel very lucky if they detect a variant which can account for 1 percent of the variance."
The implications of this are clear: we want to find risk alleles to predict common diseases, but given the missing heritability, we can't predict common diseases.

Is this surprising?

Given the reactions I saw on the internet, apparently it is. People claim we still haven't found all variants and that's where the missing heritability's hiding. Maybe. However, after reading so much about epigenetics, RNA editing, and epistasis, allow me to be skeptical. Traits (proteins, diseases, etc.) are not genes. The path from genes to traits is long and convoluted.

So, what's Lander's point in this PNAS paper? Something I've also previously discussed: epistasis, or the way genes interact together. We're missing heritability because we think of risks as additive, but additivity doesn't count for interactions. If you take into account interactions between genes, the total heritability is much smaller than anticipated and hence the percentage of what the variants are explaining (all together) much larger.
"Quantitative geneticists have long known that genetic interactions can affect heritability calculations. However, human genetic studies of missing heritability have paid little attention to the potential impact of genetic interactions."
Now here's the beauty of this paper. They do not deny the additive risk model. They extend it:
"We thus introduce the limiting pathway (LP) model, in which a trait depends on the rate-limiting value of k inputs, each of which is a strictly additive trait that depends on a set of variants (that may be common or rare). When k = 1, the LP model is simply a standard additive trait. For k > 1, we show that LP(k) traits can have substantial phantom heritability."
Again, mathematician thinking here, but that's exactly what models are for: some traits may very well be additive. However, the model does not fit all the data we observe it. Hence we need a better model, one that encompasses the old one and at the same time goes beyond it. Gene-gene interactions need not explain all missing heritability. But since they've been observed, we need to account for them in those situations where they may be real.
"The potential magnitude of phantom heritability can be illustrated by considering Crohn's disease, for which GWAS have so far identified 71 risk associated loci (13). Under the usual assumption that the disease arises from a strictly additive genetic architecture, these loci explain only 21.5% of the estimated heritability. However, if Crohn's disease instead follows an LP(3) model, the phantom heritability is 62.8%, thus genetic interactions could account for 80% of the currently missing heritability."
"In short, genetic interactions may greatly inflate the apparent heritability without being readily detectable by standard methods. Thus, current estimates of missing heritability are not meaningful, because they ignore genetic interactions."
"The results show that mistakenly assuming that a trait is additive can seriously distort inferences about missing heritability. From a biological standpoint, there is no a priori reason to expect that traits should be additive. Biology is filled with nonlinearity: The saturation of enzymes with substrate concentration and receptors with ligand concentration yields sigmoid response curves; cooperative binding of proteins gives rise to sharp transitions; the outputs of pathways are constrained by rate-limiting inputs; and genetic networks exhibit bistable states."
Mother Nature did not create mathematics. We created mathematics to describe Mother Nature. We start with a simple model and build up on it. The data is always the reality check, we should never forget that.

[1] Lander, E., Linton, L., Birren, B., Nusbaum, C., Zody, M., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., Funke, R., Gage, D., Harris, K., et al. (2001). Initial sequencing and analysis of the human genome Nature, 409 (6822), 860-921 DOI: 10.1038/35057062

[2] Zuk, O., Hechter, E., Sunyaev, S., & Lander, E. (2012). The mystery of missing heritability: Genetic interactions create phantom heritability Proceedings of the National Academy of Sciences, 109 (4), 1193-1198 DOI: 10.1073/pnas.1119675109


  1. It makes a great deal of sense that many, many factors influence a particular set of genetic triggers.

    1. Thanks, Steve, I agree. Of course, it's always a good practice to start with simpler models first -- we need to make simplifications and approximations, there's no way around it -- but when the simpler models don't work it means it's time to add onto the model.

  2. here's my reply re plastids, I'm glad you figured it out, because I was going to have to do a little research :) yeah ... plastids are different from mitochondria, what I remember is that they are pigment-containing organelles for "making food". Chloroplasts are the ones we're most familiar with. They have their own genomes, and are thought to have originated through symbiosis, like mitochondria ... or maybe we should call it theft! I just came across the word "kleptoplasts" for plastids that are "stolen" by other organisms ... like sea slugs!! Have you seen the video of a sea slug sucking chloroplasts from algae?

    The plastids are surprisingly stable in the sea slug, and there's a bit of evidence that some genes in the host genome are involved. Wow, that's wild! But then so is a lot of what we're learning these days!

    Thanks for the great molecular biology posts, I really enjoy hearing about all the new discoveries and thinking.

  3. Sorry the strong words, but despite this well done summary i cannot help but seeing how stupid this discussion on missing heritability is. First geneticists make the deliberate simplifying assumption that genetic factors (now: genomic loci made of DNA) contribute additively (linearly) to a phenotype and do not interact, and they neglect gene-gene interactions. Then they admit it through the backdoor of the vague notion of epistasis which in their world is treated rather as an anomaly to be afraid of. Then, they rediscover the possibility of genetic interactions when their non-interaction models fail. This is in the grander scheme simply utterly poor science of an entire community whose people only talks to each other but not outside that community.

    This led to the taking for granted an old, idiotic model by an entire community - a view that spread like random genetic drift and got fixated without being tested for its fitness.

    There exist other communities, such as the fields of nonlinear systems dynamics, complex network theory etc, for which gene-gene interactions are the normal case, and direct linear, additive genotype-phenotype mapping is the rare uninteresting exception.

    1. I was almost tempted not to publish your comment because, while I welcome all opinions and I am open to discussion, I also appreciate full respect and your language is borderline. SCIENCE IS NOT EASY! MODELING IS NOT EASY! A model is never the full truth and while sophisticated models may encompass more aspects, they may as well not be solvable with the means we have today. So the first approach is the spherical cow and if it works we have a new way of describing the cow. After all, if you sit on the Moon and all you have is a telescope, you might as well see that cows are spherical, if you can see them at all. Then you build a better telescope and surprise, you find out that cows aren't that spherical after all, they have four legs. So now, wiser from your better telescope, you build a new model that encompasses spheres with four legs. Then, in a decade or so you build an even better telescope and realize that cows have necks, too, and mouths, and they ruminate. So now yo build a new model.

      Get the picture? That's how science works. Models aren't the truth, and simplifications are a necessity. Theories work for as long as they do a good job describing what we see with the tools and means with have at the time. Newton devised a great theory called gravitation. Until Einstein expanded it, it was the best gravitation theory we had. That doesn't mean Newton was stupid.

      Thanks for your comment.


Comments are moderated. Comments with spam links will be deleted and never published. So, if your intention is to leave a comment just to post a bogus link, please spare your time and mine. To all others: thank you for leaving a comment, I will respond as soon as possible.