Debunking myths on genetics and DNA

Saturday, November 12, 2011

An addendum on Haldane's dilemma and the use of mathematical models

Last week, my post on Haldane's dilemma garnered many views. I'm glad people are reading it and I hope they find it useful in clarifying the great impact of Haldane's 1957 paper. For those of you interested in digging deeper into the topic, the Panda's Thumb discusses the matter in a 2007 post, and Gene Expression covers it here.

I just have an additional note, which is a bit of a pet peeve of mine, but as I read about the reactions to Haldane's paper scattered all over the Internet, I realized that people tend to say things like "Haldane was wrong," or, "Haldane was right, and such and such are wrong."

Let's get this straight: Haldane formulated a mathematical model. His work set the foundations for the mathematical theory of population genetics. The usefulness of mathematical models is bi-fold: they either fit the data or they don't, and in either case they are informative. Let me explain better.

You can break down scientific thinking in the following points:
  • Hypothesis.
  • Assumptions.
  • Model.
  • Conclusions.
There's usually one or more hypotheses you want to test. You come up with a set of assumptions you need to make. You design a model, you test it, you reach your conclusions. Once you have it, you use the model in a comparative way: if it correctly represents the data, then the assumptions of the model are met. If it doesn't, then you go back and see which of your assumptions have failed in the dataset.

Back to Haldane. He formulated a question: how many generations do I need in order for a minor allele under selection pressure to get fixed? He made certain assumptions (infinite population size, constant selection pressure, etc.), designed a model, came to a conclusion. Now here's the power of the mathematical model: if we find an incongruity between the observed data and the model, then we know where to look for the fallacy. In the assumptions. Today we know that most mutations arise under completely neutral conditions. Haldane wasn't wrong. He just formulated a model. A powerful one, one that nobody had thought of before him. One that later inspired Kimura's neutral theory and that made us understand evolution better because we realized that not all alleles are under selection pressure.  

Looking in my own backyard (I don't mean to promote my own work, but this is an example I can easily explain), in 2008 we published a mathematical model of viral evolution in early HIV-1 infections [2]. Our particular question was: how many genetically distinct viruses enter the host in any given sexually transmitted infection? And then, given that the immune system takes some time to mount its defense against the viral infection, we also asked, how early does selection pressure from the immune system kick in? In order to answer these questions, we designed a model that made several assumptions, including: (i) one virus only initiates the infection; (ii) the viral population grows under no selection. This second assumption raises many eyebrows when I present the model. The typical objection I hear is: "How can you be sure there's no selection?" Well, I'm not. But that's why I have the model.

Our samples (sequences of viral DNA from plasma) come from patients that have acquired the virus only a few weeks earlier. If not much time has passed since the start of the infection, there won't be any selection pressure on the virus because the host's immune system hasn't "prepared" its response yet. However, occasionally we will get a sample that does show the first evidence of selection pressure. How do we prove there's selection? We take that particular dataset and see that the model doesn't fit. By using our model in "reverse" (so to speak) we were able to observe that the host's selection pressure in HIV-1 infections starts earlier than previously thought.

Bottom line: mathematical models are used not only to describe the data, but also to prove or disprove whether or not certain assumptions are justified. And knowing which assumptions failed is just as informative as knowing that the model fits the data well. Yes, it is a subtlety, but it's an important one, because if you listen carefully to those who raise pseudo-scientific arguments against evolution, you'll see that the main point they are missing is exactly what I tried to illustrate above: the scientific use of a model.

[1] Haldane, J. (1957). The cost of natural selection Journal of Genetics, 55 (3), 511-524 DOI: 10.1007/BF02984069

[2] Keele BF, Giorgi EE, Salazar-Gonzalez JF, Decker JM, Pham KT, Salazar MG, Sun C, Grayson T, Wang S, Li H, Wei X, Jiang C, Kirchherr JL, Gao F, Anderson JA, Ping LH, Swanstrom R, Tomaras GD, Blattner WA, Goepfert PA, Kilby JM, Saag MS, Delwart EL, Busch MP, Cohen MS, Montefiori DC, Haynes BF, Gaschen B, Athreya GS, Lee HY, Wood N, Seoighe C, Perelson AS, Bhattacharya T, Korber BT, Hahn BH, & Shaw GM (2008). Identification and characterization of transmitted and early founder virus envelopes in primary HIV-1 infection. Proceedings of the National Academy of Sciences of the United States of America, 105 (21), 7552-7 PMID: 18490657


Comments are moderated. Comments with spam links will be deleted and never published. So, if your intention is to leave a comment just to post a bogus link, please spare your time and mine. To all others: thank you for leaving a comment, I will respond as soon as possible.