CHIMERAS: population genetics

Showing posts with label population genetics. Show all posts

Friday, February 21, 2014

Converging genes reveal how plagues have shaped our genome

Evolution is shaped by numerous factors. Selection is one of such factors, but, contrary to popular belief, it is not the only force acting on genomes. I cringe when I hear the expression "this gene has been selected for" because most of our alleles (we all have the same genes, but each gene can have different alleles across different ethnic groups/populations) haven't been selected at all. Things change even without any selection pressure from the environment, a phenomenon known as random drift. every new generation is a (more or less) random sample from the previous generation, and this constant resampling ensures a background change in allele frequencies, even without any selection pressure from the environment.

Because selection is not the only factor that shapes evolution, it is hard to look at how our genome evolved and pin point what changes were due to selection and which ones weren't. However, there are some rare situations where scientists get lucky. One such example is the Rroma people, also known as Gipsies. This ethnic group originated from Northern India and migrated to Europe around 1,000-1,500 years ago. Because throughout the centuries they remained a homogeneous group and rarely mingled with the local population, when looking back at some of the historical plagues that swept through Europe, the Rroma offer a unique snapshot of a distinct population undergoing the same selection pressure as the locals.

Here's the logic: alleles found in the Rroma population but not in their Indian ancestors must have risen recently in the Rroma population. If those alleles are also found in the local population, which are not related to the Rroma, then these alleles must have risen independently in the two populations. But how, if the two populations did not intermerry? Well, if you think about it, the part of our body that's most certainly under selection pressure is the immune system: a strong immune system enables the survival of not just one individual, but also of his/her offspring if they inherit the right alleles. Historical plagues that swept through Europe exerted a strong selection pressure on the immune system at the population level. Individuals with favorable alleles were able to survive these plagues, whereas the others succumbed. So, when the researchers found alleles that had risen independently in the Rroma and in the local population, they concluded
that they had been selected by severe epidemics in Europe.

The study, published in PNAS last week [1], aimed at finding "convergent evolution" between the two coexisting but genetically distinct populations. Convergent evolution means that, under selection pressure (such as for example a widespread epidemic), distinct genomes are forced to converge independently to the same allele because that particular allele confers protection against the epidemic.

"We hypothesized that despite their different ethnic and genetic backgrounds, the strong infectious pressure exerted by the major epidemics of the last millennium (of which epidemics of plague are probably the most significant) has led to convergent evolution: specific immune genes, selected during these European epidemics, become signatures that differ from those found in the Northwest Indian populations from whom the Rroma have derived [1]."

Laayouni et al. [1] found several gene clusters under positive selection, of which one in particular (TLR1, TLR6, and TLR10) code for receptors that modulate responses to Yersinia pestis, the bacterium responsible for the bubonic plague.

Hafid Laayounia,1, Marije Oostingb,c,1, Pierre Luisia, Mihai Ioanab,d, Santos Alonsoe, Isis Ricaño-Poncef, Gosia Trynkaf,2, Alexandra Zhernakovaf, Theo S. Plantingab, Shih-Chin Chengb, Jos W. M. van der Meerb, Radu Poppg, Ajit Soodh, B. K. Thelmai, Cisca (2014). Convergent evolution in European and Rroma populations reveals pressure exerted by plague on Toll-like receptors PNAS DOI: 10.1073/pnas.1317723111

Saturday, October 5, 2013

Sex Is Always Well Worth Its Two-Fold Cost

Title borrowed from Feigel et al. [1].

Sex is costly. In an asexual population, all individuals bear offsprings, resulting in a higher growth rate than in a sexual population (two-fold cost of sex). Finding a partner is risky, costly in terms of energy and resources, and it results in sexual selection which may not always favor survival. Finally, in sexual populations each individual passes only 50% of its genetic make-up to their offsprings and, furthermore, genetic recombination could break-up alleles that are in an epitastic relationship with one another (they are advantageous when together, but once separated they may incur into fitness loss).

However:

"The advantages of sexual reproduction stem from quite various roots. For instance, sex increases genetic variability by recombination of the parental chromosomes. It makes a population more resistant against many unpredictable threats, such as deleterious mutations, parasites, a fluctuating environment, or competing groups. It also optimizes the evolutionary search for the best gene combinations in a single individual (epistasis) [1]."

Let's try an understand this better. Different alleles in the genome are not always independent, as they may affect fitness in conjunction, a mechanism called epistasis. For example, two alleles may be beneficial together, but their benefit may be lost when separated by a recombination event. Or, it could be the other way around, that a mutation arises under certain constraints, and it's not until paired with a second mutation that it becomes beneficial. This is often observed in drug resistance, for example. A mutation that confers the organism (a virus, or a bacterium) drug resistance could potentially make it less fit (for example, if it makes the organism more "visible" to the immune system). In these cases, often one observes a new mutation arise in conjunction with the drug-resistant one, and the two together restore the organism's original fitness. These secondary mutations are called compensatory mutations because they compensate for the original loss of fitness.

Recombination of genomes can go either way: it can bring beneficial mutations together, or, it can break them apart. In a Nature Genetics review [2], the authors mention a study done on segmented viruses: in this case, "sex" is equivalent to two viruses co-infecting the same cell, as when this happens the enzyme that replicates the genes jumps back and forth between the two genomes and the resulting new genome is a reshuffle of the two parental ones. The advantage of using viruses to study the effect of sex is that you can compare the result of sexual reproduction versus asexual reproduction in the same population. In the case of the segmented virus study, it was observed that an adverse mutation was slower to get cleared in the sexual population than the asexual one.

The same review cites studies done on yeast that yielded mixed results: some showed that sex did increase the rate of adaptation of the population, and some showed the opposite. A paradox? Not quite, if you throw into the picture the size of the population.

"Two recent studies have also tested the effect of recombination on the rate of adaptation in evolving microbial populations. When populations of C. reinhardtii that initially lacked genetic variation were allowed to adapt to a novel growth medium in sexual and asexual populations of varying size, sex increased the rate of adaptation at all population sizes, but particularly in large populations [2]."

Another study done on sexual and asexual yeast strains, compared adaptation in two environments: the mouse brain, which represented a highly variable environment, and a test tube with minimal growth medium.

"When sex was induced, the sexual strain won the competition in the mouse brain but not in the test tube, despite the fact that it also showed general adaptation to this environment. These results indicate an advantage to sex during adaptation to variable or harsh environments [2]."

Despite all these studies, it is still unclear what drove the evolution of sex. Did sex prevail thanks to epistasis? Or was it just drift, the random accumulation of mutations due to pure chance? More recent studies have looked at a combination of mechanisms that may have been responsible for the rise in sexual populations. For example, other aspects to account for, besides epistasis and drift, are redundancy and genome complexity. As organisms have evolved, their genomes have increased in size and complexity. Redundancy allows for more than one gene or pathway to have same function, buffering the effect of deleterious mutations. It also maintains a reservoir of non-coding allele variants that are always available in the search for new evolutionary pathways. At the same time, sex and recombination together cause genomes to be more robust and overcome the short-term disadvantage in favor of long-term advantages like increased evolvability.

[1] Alexander Feigel,, Avraham Englander,, & Assaf Engel (2009). Sex Is Always Well Worth Its Two-Fold Cost PLoS ONE DOI: 10.1371/journal.pone.0006012

[2] J. Arjan G. M. de Visser & Santiago F. Elena (2007). The evolution of sex: empirical insights into the roles of epistasis and drift Nature Genetics Review DOI: 10.1038/nrg1985

Tuesday, October 1, 2013

Ms. Stick Insect

Image credit: funkman.org.

You're looking at a stick insect, a critter I was quite used to growing up as my dad, an evolutionary biologist, used to grow them at home. I know, most households have cats, dogs, guinea pigs and rabbits; ours had cats, dogs, toads, fruit flies, and stick insects. :-)

Children have a tendency to personify everything, animals in particular, so imagine my shock when my dad told me that stick insects are all... ladies. Yup. It's Ms. Stick Insect. And the reason why I mention this is that today I'd like to talk about sex. Ha! You didn't see that coming, did you?

How does an all-female population manage to reproduce? Embryos develop from eggs using parthenogenesis, without the need to be fertilized. This doesn't mean that the offsprings will be identical to the parent. "Reshuffling" of genes is still ensured by meiosis.

In organisms that reproduce sexually, meiosis produces gametes, cells that carry half of the chromosomes and therefore, once fused with the opposite sex gamete, it will produce a cell with the full number of chromosomes. In organisms that reproduce sexually, meiosis produces gametes, cells that carry half of the chromosomes and therefore, once fused with the opposite sex gamete, it will produce a cell with the full number of chromosomes. In diploid organisms (organisms that have two copies of each chromosome), meiosis takes place in the following steps: (i) DNA replication, which creates two exact copies of each chromosome; (ii) pairing of the chromosome homologs, one maternal and one paternal; (iii) the homologs' cross-over creating a unique mix of maternal and paternal DNA; (iii) another round of cell division creates four cells, each with one set of chromosomes.

In parthenogenesis meiosis, step (i) is skipped. In order to restore the two copies of chromosomes, in some perhenogenetic animals, the cell division in step (iv) creates two cells instead of four, each with two copies of chromosomes. However, stick insects employ a different strategy: step (iv) still creates four cells, of which only one has the cytoplasm. This cell then fuses with one of the other three effectively creating and egg with two copies of chromosomes, perfectly equivalent to a fertilized egg.

Not all stick insects reproduce through parthenogenesis. Some populations do have males and mate, though usually only about 10% of offsprings come from sexual reproduction. Morgan-Richards et al. [1] compared several populations of New Zealand stick insects (C. hookeri), and found that while mated females produced male and female offsprings in equal numbers, virgin females that reproduced via parthenogenesis produced mostly females. That's right, I said "mostly".

"A single male hatched from an egg laid by a captive virgin mother. [...] This male may have arisen by the loss of an X chromosome during cell division (non-disjunction), a mechanism recorded for other stick insect species with the same XO⁄XX sex-determination mechanism seen in C. hookeri [1]."

So even in completely parthenogenetic populations, in principle sexual reproduction is not completely lost as the reshuffling provided by meiosis can, occasionally, originate a male offspring. Furthermore, the authors confirmed a geographical distribution of the parthenogenetic population of stick insects compared to the sexual ones: all female populations in New Zealand tend to be more common farther away from the equator and at higher altitudes, implying the adaptive advantage of parthenogens in certain environments but not in others.

The fact that parthenogens would have an adaptive advantage intrigued me, so I dug a bit further and found out about a concept called the two-fold cost of sex. In a sexual population, only one of the two sexes bares offsprings, while in a one-sex population all individuals bare offsprings, hence significantly increasing its growth rate. This seems to indicate that asexual populations have a higher Darwinian fitness. So, how did we end up with so many sexual species given especially that we all originated from asexual ancestors? How can sex be evolutionary successful when the odds seem to be against it?

I'll save that discussion for the next post. :-)

[1] MARY MORGAN-RICHARDS,, STEVE A. TREWICK,, & IAN A. N. STRINGER (2010). Geographic parthenogenesis and the common tea-tree stick insect of New Zealand Molecular Ecology DOI: 10.1111/j.1365-294X.2010.04542.x

Thursday, April 19, 2012

Four decades of computational genomics.

"Every genome is the result of a mostly shared, but partly unique, 3.8-billion-year evolutionary journey from the origin of life. Diversity is created mostly by copy errors during replication."

The above is taken from a review in the latest issue of Science [1] that summarizes the progress made in the field of computational genomics since the first sequences obtained back in the mid-seventies. I highly recommend reading the review. Here, I'd like to highlight a few relevant points.

Zerbino, Paten, and Haussler summarize nicely the different types of DNA edits that over those 3.8 billion years have brought us the genetic diversity we observe today. Replication copy errors give rise to single-base changes that can get fixed in the entire population (substitution) or can be present in only part of the population (single-nucleotide polymorphisms). Multiple sequential bases can be duplicated or erased, in which case we talk about indels. Rearrangements can occur, leading to changes in gene copies or even chromosome numbers.

There's so much more to a DNA sequence than just a string of four letters. Genes are not fully understood until you look at their history throughout evolution and throughout the single individual's life, their regulatory mechanisms, their interactions with other genes (epistasis), their epigenetic pathways, their function, etc. With this in mind, computational genomics has the arduous task of not only efficiently store and retrieve the enormous amounts of data, but also build models that encompass epigenetic mechanisms, metabolic pathways, and gene regulatory networks.

"Combining evolutionary, mechanistic, and functional models, computational genomics interprets genomic data along three dimensions. A gene is simultaneously a DNA sequence evolving in time (history), a piece of chromatin that interacts with other molecules (mechanism), and, as a gene product, an actor in pathways of activity within the cell that affect the organism (function). [. . .] Beyond the basics of storing, indexing, and searching the world's genomes, the three fundamental, interrelated challenges of computational genomics are to explain genome evolution, model molecular phenotypes as a consequence of genotype, and predict organismal phenotype."

Genomic evolution is studied using phylogenetic analyses. This presents its challenges, starting from finding optimal ways to align the sequences: in order to compare different sequences, one has to make sure that there is a one-to-one correspondence between each base in each sequence, as shown in the figure below.

Once aligned, one builds phylogenetic trees in order to represent the evolutionary history of the sequences: from the leaves of the tree all the way back to the root, each node in the tree represents a "coalescent" event in the evolutionary history, in other words the event when two distinct lineages shared a common ancestor.

"When applied to more than two species or to multiple gene copies within a species, phylogenetic methods provide an explicit order of gene descent through shared ancestry. [. . .] Finding the optimal phylogeny under probabilistic or parsimony models of substitutions (and also of indels) is NP-hard, and considerable effort has been devoted to obtaining efficient and accurate heuristic solutions."

Right now algorithms that compute phylogenetic trees are computationally intensive and take a long time to run. As the sequencing technology advances and it's possible to sequence more data, larger regions, and in a more efficient way, the challenge is in making also the phylogenetic analyses more computationally efficient.

The next big challenge computational genomics embraces is predicting causal variants. Whole genome studies have to take to account population stratification due to the fact that we are a relatively young species and, as such, all related. New databases are emerging in order to provide epigenetic context and data, RNA expression, and protein levels. All this needs to be folded in in order to make causal predictions from genotype to phenotype.

The coming together of all this information will benefit medical research on multiple levels. Since nearly all cancers are caused by genetic modifications, computational genomics will help us understand cancer therapeutics and tumorigenesis. Stem cell research will also benefit from progress made in computational genomics as it involves the full understanding of variants and their effects not just on the genome, but also on the epigenome and gene expression.

"To face the challenges of obtaining the maximum information from every sequencing experiment, we must borrow advances from a spectrum of different research fields and tie them together into foundational mathematical models implemented with numerical methods. There is a tension between the comprehensiveness of models and their computational efficiency. [. . .] As a common language develops, shaped by our increasing knowledge of biology, we anticipate that computational genomics will provide enhanced ability to explore and exploit the genome structures and processes that lie at the heart of life."

[1] Zerbino, D., Paten, B., & Haussler, D. (2012). Integrating Genomes Science, 336 (6078), 179-182 DOI: 10.1126/science.1216830

Wednesday, January 25, 2012

Surfing the wave of genetics: the man who invented genetic landscapes

Today is the 90th birthday of the one and only Luigi Luca Cavalli-Sforza, professor emeritus at Stanford University and a pillar in population genetics. Oh, and in case you couldn't tell by the name, he's Italian, too. Not that I'm biased, mind you.

Cavalli-Sforza is best known for his book The History and Geography of Human Genes, in which he reconstructs the history of human migrations by mapping the distribution of gene alleles and correlating gene frequencies in populations with the geographic distances between them.

I had an interesting discussion a few months ago and it occurred to me then that many people outside the field of genetics still think that all traits are selected through evolution. This is not true. If you remember, another famous population geneticist came up with a mathematical model according to which it would take 300 generations for a trait under constant selection pressure to completely take over. That lead to Haldane's dilemma and the fact that such time scale was too slow to explain all genetic variation observed today.

The fact that Haldane's model didn't fit the observations eventually lead to the neutral theory of molecular evolution, and one of the greatest players in this new thinking was Motoo Kimura. Kimura's theory of "random genetic drift" is based on the assumption that most mutations are free of selective effects, and hence the rate of molecular evolution is determined by the mutation rate. This is backed up by the fact that most mutations we see are "silent" (which means they bear no effect on the proteins) and that most of the DNA in eukaryotes is non-coding.

Genetic drift is the change in allele frequencies due to chance. Under selection, some individuals pass their genes onto the next generation because they are "fitter." However, if not all traits are under selection, the vast majority is driven by chance. Some individuals will have offsprings, others won't, and each generation represents a new random drawing in the gene pool. When a random mutation arises in a population, assuming the mutation is neutral (in other words it doesn't affect the fitness of the individuals), the chance that it will get fixed in the population by random drift is 1/N where N is the population size. Therefore, the smaller the population, the greater the chance that a random mutation becomes prevalent by "chance" (and not selection!).

To celebrate Cavalli-Sforza's birthday, I chose a paper published in 2009 [1] that looks at genetic diversity in the Y chromosome and compares it to the expected variation under neutral drift. From the abstract:

"We observe geographic peculiarities with some Y chromosome mutants, most probably due to a drift-related phenomenon called the surfing effect. We also compare the overall genetic diversity in Y chromosome DNA data with that of other chromosomes and their expectations under drift and natural selection, as well as the rate of fall of diversity within populations known as the serial founder effect during the recent ‘‘Out of Africa’’ expansion of modern humans to the whole world. All these observations are difficult to explain without accepting a major relative role for drift in the course of human expansions."

The surfing effect is a really interesting phenomenon: mutations that arise in the wave front of an expanding population have an advantage over mutations that arise in individuals who are left behind with respect to the migrating portion of the population. This is because the front of the migration is a local, temporarily smaller population, and since the probability of a mutation to get fixed is inversely proportional to the population size, the fact that the mutants arise in a smaller population puts them at an advantage. Furthermore,

"The faster the population expansion, the greater the probability of success of a mutant that arises in the wave front, because then the wave front is longer."

In the paper, Chiaroni et al. look at the 18 major haplogroups (genetically similar groups that can be thought of as originating from the same ancestor) of Y chromosome genotypes and inferr their place of origin.

"If migrations were random, the geographic distribution of individuals with a specific haplogroup would be approximately normal (Gaussian) around the place of origin of the oldest mutation defining the haplogroup, apart from irregularities due to vagaries of the environment: obstacles, like mountains and deserts, or favored routes, like coasts and rivers."

The interesting finding in the paper is that while the expected genetic diversity for chromosome X more or less matches the observed one, the expected diversity of chromosome Y is significantly higher than the observed one indicating that, on average, there is more natural selection acting on X and the other autosome chromosomes than on the Y chromosome.

The authors conclude:

"The increasing role of human creativity and the fast diffusion of inventions seem to have favored cultural solutions for many of the problems encoun- tered in the expansion. We suggest that cultural evolution has been subrogating biologic evolution in providing natural selection advan- tages and reducing our dependence on genetic mutations, especially in the last phase of transition from food collection to food production."

[1] Chiaroni, J., Underhill, P., & Cavalli-Sforza, L. (2009). Y chromosome diversity, human expansion, drift, and cultural evolution Proceedings of the National Academy of Sciences, 106 (48), 20174-20179 DOI: 10.1073/pnas.0910803106

Saturday, November 12, 2011

An addendum on Haldane's dilemma and the use of mathematical models

Last week, my post on Haldane's dilemma garnered many views. I'm glad people are reading it and I hope they find it useful in clarifying the great impact of Haldane's 1957 paper. For those of you interested in digging deeper into the topic, the Panda's Thumb discusses the matter in a 2007 post, and Gene Expression covers it here.

I just have an additional note, which is a bit of a pet peeve of mine, but as I read about the reactions to Haldane's paper scattered all over the Internet, I realized that people tend to say things like "Haldane was wrong," or, "Haldane was right, and such and such are wrong."

Let's get this straight: Haldane formulated a mathematical model. His work set the foundations for the mathematical theory of population genetics. The usefulness of mathematical models is bi-fold: they either fit the data or they don't, and in either case they are informative. Let me explain better.

You can break down scientific thinking in the following points:

Hypothesis.
Assumptions.
Model.
Conclusions.

There's usually one or more hypotheses you want to test. You come up with a set of assumptions you need to make. You design a model, you test it, you reach your conclusions. Once you have it, you use the model in a comparative way: if it correctly represents the data, then the assumptions of the model are met. If it doesn't, then you go back and see which of your assumptions have failed in the dataset.

Back to Haldane. He formulated a question: how many generations do I need in order for a minor allele under selection pressure to get fixed? He made certain assumptions (infinite population size, constant selection pressure, etc.), designed a model, came to a conclusion. Now here's the power of the mathematical model: if we find an incongruity between the observed data and the model, then we know where to look for the fallacy. In the assumptions. Today we know that most mutations arise under completely neutral conditions. Haldane wasn't wrong. He just formulated a model. A powerful one, one that nobody had thought of before him. One that later inspired Kimura's neutral theory and that made us understand evolution better because we realized that not all alleles are under selection pressure.

Looking in my own backyard (I don't mean to promote my own work, but this is an example I can easily explain), in 2008 we published a mathematical model of viral evolution in early HIV-1 infections [2]. Our particular question was: how many genetically distinct viruses enter the host in any given sexually transmitted infection? And then, given that the immune system takes some time to mount its defense against the viral infection, we also asked, how early does selection pressure from the immune system kick in? In order to answer these questions, we designed a model that made several assumptions, including: (i) one virus only initiates the infection; (ii) the viral population grows under no selection. This second assumption raises many eyebrows when I present the model. The typical objection I hear is: "How can you be sure there's no selection?" Well, I'm not. But that's why I have the model.

Our samples (sequences of viral DNA from plasma) come from patients that have acquired the virus only a few weeks earlier. If not much time has passed since the start of the infection, there won't be any selection pressure on the virus because the host's immune system hasn't "prepared" its response yet. However, occasionally we will get a sample that does show the first evidence of selection pressure. How do we prove there's selection? We take that particular dataset and see that the model doesn't fit. By using our model in "reverse" (so to speak) we were able to observe that the host's selection pressure in HIV-1 infections starts earlier than previously thought.

Bottom line: mathematical models are used not only to describe the data, but also to prove or disprove whether or not certain assumptions are justified. And knowing which assumptions failed is just as informative as knowing that the model fits the data well. Yes, it is a subtlety, but it's an important one, because if you listen carefully to those who raise pseudo-scientific arguments against evolution, you'll see that the main point they are missing is exactly what I tried to illustrate above: the scientific use of a model.

[1] Haldane, J. (1957). The cost of natural selection Journal of Genetics, 55 (3), 511-524 DOI: 10.1007/BF02984069

[2] Keele BF, Giorgi EE, Salazar-Gonzalez JF, Decker JM, Pham KT, Salazar MG, Sun C, Grayson T, Wang S, Li H, Wei X, Jiang C, Kirchherr JL, Gao F, Anderson JA, Ping LH, Swanstrom R, Tomaras GD, Blattner WA, Goepfert PA, Kilby JM, Saag MS, Delwart EL, Busch MP, Cohen MS, Montefiori DC, Haynes BF, Gaschen B, Athreya GS, Lee HY, Wood N, Seoighe C, Perelson AS, Bhattacharya T, Korber BT, Hahn BH, & Shaw GM (2008). Identification and characterization of transmitted and early founder virus envelopes in primary HIV-1 infection. Proceedings of the National Academy of Sciences of the United States of America, 105 (21), 7552-7 PMID: 18490657

Saturday, November 5, 2011

Haldane's dilemma

Today is JBS Haldane's 119th birthday. Together with Fisher and Wright, Haldane is considered the founder of the mathematical theory of population genetics. Population genetics studies how allele frequencies (the prevalence of different copies of genes) change in populations due to processes like natural selection and genetic drift. In other words, how mutations arise and how they undergo a turnover in the population.

To celebrate Haldane's birthday, I thought I'd discuss his 1957 paper, "The Cost of Natural Selection" [1], which, unfortunately, has gained popularity after some people started using it as an argument against evolution. In this paper, Haldane states that multiple favorable traits cannot be selected at once:

"In this paper I shall try to make quantitative the fairly obvious statement that natural selection cannot occur with great intensity for a number of characters at once unless they happen to be controlled by the same genes."

Haldane poses the following question: supposing you have constant selective pressure towards one trait in particular, how many individuals without the trait need to die before the new trait takes over? Too many deaths will cause the population to go extinct, but too little will never allow the turnover of the new trait. This is what he defines the "substitution cost," or, in other words, the cost for a trait to become advantageous.

He calculates the substitution cost under a very particular scenario: suppose that a sudden change happens in the environment (like a shift in climate, or the introduction of a new predator), and this causes a certain species to be less adapted to the environment, and therefore, to have lower reproduction rate. The less fit individuals will die first, thus allowing natural selection to push forward the fitter ones. Suppose that a particular mutated gene, until then rare in the population, favors adaptation to the new environment. Gradually, the population will see a shift in prevalence of the new trait. Individuals without the trait will progressively go extinct and in this process other traits may get lost. As a result, under this scenario, no more than one gene can be selected at once.

The concepts in this paper were later referred as "Haldane's dilemma" by paleontologist Van Valen, who formulated the dilemma as "for most organisms, rapid turnover in a few genes precludes rapid turnover in the others. A corollary of this is that, if an environmental change occurs that necessitates the rather rapid replacement of several genes if a population is to survive, the population becomes extinct."

In his 1957 paper Haldane concludes:

"Unless selection is very intense, the number of deaths needed to secure the substitution, by natural selection, of one gene for another at a locus, is independent of the intensity of selection. It is often about 30 times the number of organisms in a generation. It is suggested that, in evolution, the mean time taken for each gene substitution is about 300 generations. This accords with the observed slowness of evolution."

This may indeed sound surprising. If it takes 300 generations for one trait to replace the old one, how can we possibly have achieved the kind of diversity we observe today? Two of Haldane's assumptions are problematic: (1) he assumed an infinite size population; (2) he assumed the selective pressure on the new trait to be constant over the years.

Haldane's claim have been revised and re-elaborated by many scientists, and probably the most famous one is evolutionary biologist Motoo Kimura, who, in the early '60s, used a diffusion equation to recalculate the substitution cost. Kimura noticed that under Haldane's model, it would take an enormous number of offsprings to keep the current rate of natural selection. This became the basis of Kimura's neutral selection theory, in which he claimed that the vast majority of genetic changes are not "selected." Instead, according to Kimura, genetic changes are mostly random changes with no effect (neutral), which get fixated in the population simply because of the resampling from one generation to the next (a process called "genetic drift"). In other words, according to Kimura, the fact that some individuals reproduce and others don't causes certain traits to gradually disappear from the population.

So, which is it? Completely neutral mutations that get fixed because of random mating, or complete selection on every single trait? Most likely, it's a combination of both. Very few traits are truly selected for. Mutations arise constantly and, in a small population, they can pick up just because of genetic drift. Historically, people migrated and the geography of the landscape changed, causing populations to split. In a small population a minor mutation is more likely to pick up and then get fixated because of genetic drift, whether the mutation is advantageous or not. However, if the mutation is indeed advantageous, a sudden selective sweep would pick it up. But this -- most likely -- didn't happen under constant pressure over years, like Haldane originally formulated. It was more likely occasional sweeps (think of a particularly virulent flu season, for example) that switched the minor allele from minor to wild-type, and then genetic drift did the rest.

As for Haldane's numbers, they're not as far off as one may think. I did, however, find a paper published in 1977 [2] in which the author showed that Haldane had overestimated the cost of natural selection by allele substitution. Darlington states: "The cost is reduced if recessive alleles are advantageous, if substitutions are large and few, if selection is strong and substitutions are rapid, if substitutions are serial, and if substitutions in small demes are followed by deme-group substitutions. But costs are still so heavy that the adaptations of complex organisms in complex and changing environments are never completed. The rule probably is that most species most of the time are not fully adapted to their environments, but are just a little better than their competitors for the time being."

In other words, evolution is a work in progress.

Dilemma aside, I love Haldane for this famous quote:

"Theories have four stages of acceptance. i) this is worthless nonsense; ii) this is an interesting, but perverse, point of view, iii) this is true, but quite unimportant; iv) I always said so."

And if you've ever tried to publish a scientific paper, or to publish anything at all as a matter of fact, you know exactly what Haldane was talking about.

[1] Haldane, J. (1957). The cost of natural selection Journal of Genetics, 55 (3), 511-524 DOI: 10.1007/BF02984069

[2] Darlington PJ Jr (1977). The cost of evolution and the imprecision of adaptation. Proceedings of the National Academy of Sciences of the United States of America, 74 (4), 1647-51 PMID: 266204

Photo: blue hour is the time of the day when longer exposure time grants you a blue sky and soft, yellow lights. It's particularly beautiful in an urban setting when all the city lights lit up.

Pages

Debunking myths on genetics and DNA