Debunking myths on genetics and DNA

Tuesday, November 29, 2011

Sample size, P-values, and publication bias: the positive aspects of negative thinking

If you follow the science blogging community, you may have noticed a lot of talking about sample size in the past couple of weeks. So I did my share of mulling things over and this is what I came up with.

1- The study in question had a small sample size but reported a significant p-value (<0.05). Such study is NOT underpowered. An underpowered study is a study that does not have a sufficiently large sample size to allow detection of a significant result. A significant result is by definition a p-value less than 5%, which the study in question had. So, even though in general small sample size studies are indeed underpowered, that wasn't the issue in this particular case. In general, you are not likely to see many underpowered studies published (see point 5 below).

2- The issue with ANY small sample size study is the fact that you are not capturing the whole fluctuation in the population. And if you are not capturing the whole fluctuation, chances are, your error model is wrong, and a wrong error model leads to a wrong p-value. In other words, even if you do get a significant p-value, there's a question of whether or not that particular p-value is at all meaningful.

3- Why publish a study with a small sample size, then? Welcome to the life of a scientist. You set off with a grand plan, write a grant to sequence say 100 individuals, get the money to sequence 50, then you clean the data and end up with 30. Okay, those are made-up numbers, but you get the idea. So now you got your 30 sequences and you try to make the best out of them. You state all the caveats in the discussion section of your paper and advocate for further analyses and discuss future directions. If your paper gets published you have some leverage in your next grant, as in: "Look! I saw something with 30 sequences, which is clearly not enough, so now I'm applying to get money to sequence 100." Many scientific advances have ben made following exactly this route.

4- I've been talking a lot about p-values, but... What the heck is a p-value? A p-value of, say, 0.05 boils down to the following: if your results were completely random, and you were to repeat your experiment 100 times, you would observe your original result 5% of the time just out of pure chance. Suppose for example you want to see if a particular gene allele is associated with cancer. You do your experiment and come up with a p-value of 0.03. This means that if there really was no association whatsoever between the trait you measured and cancer, you would see your particular population distribution 3% of the time out of pure chance. Now, you see why anything above 5% is not significant: to observe something 10% of the time out of pure chance means that whatever you are trying to measure is a random effect. But to see it 3% of the time makes it rare enough that we are allowed to believe that there may be something in there after all. Notice that this is pretty much how science works. Many science outsiders think that "scientific" means "certain." Not true. Scientific means we can measure the uncertainty and when it's small enough we believe the result.

5- Now that we understand what p-values are we get to another issue: publication bias. Follow the logic: I just said that we start believing a result whenever the p-value is less than 5%. Basically, you can forget publishing anything that has a p-value above 5%. But, you won't know your p-value unless you do the experiment, and you won't publish unless you get a low p-value. Which means, you will never see all the similar studies that were carried out and yielded a high p-value. Suppose an experiment were repeated across different labs 100 times. Then, just by chance alone, 5% of these experiments yield a p-value of 5% or less. However, what you end up seeing in print are the experiments that yielded the "good" p-value, not the ones that yielded the negative results. As Dirnagl and Lauritzen put it [1],
"Only data that are available via publications‚ and, to a certain extent, via presentations at conferences‚ can contribute to progress in the life sciences. However, it has long been known that a strong publication bias exists, in particular against the publication of data that do not reproduce previously published material or that refute the investigators‚ initial hypothesis."
People address the issue with meta-analyses, in which several studies are examined and both positive and negative results are pooled together in order to estimate the "true" effects.
"In many cases effect sizes shrink dramatically, hinting at the fact that very often the literature represents the positive tip of an iceberg, whereas unpublished data loom below the surface. Such missing data would have the potential to have a significant impact on our pathophysiological understanding or treatment concepts."
A new movement is rising, which advocates the publication of negative results (i.e. results that did not substantiate the alternative hypothesis), and more journals are integrating this into either a "Negative Result" section or, as BioMed Central has done, even dedicating a journal to it, the Journal of Negative Results in Biomedicine.

I welcome and embrace the change in thinking. It's the same logic I advocate for mathematical models. My new motto: "Negative results? Bring them on!" Maybe I'll have a T-shirt made -- anyone want one too?

[1] Dirnagl, U., & Lauritzen, M. (2010). Fighting publication bias: introducing the Negative Results section Journal of Cerebral Blood Flow & Metabolism, 30 (7), 1263-1264 DOI: 10.1038/jcbfm.2010.51

Monday, November 28, 2011

Courtney Schafer, author of the Shattered Sigil trilogy, on writing, mountaineering, and solving complex algorithms

Sky, mountains, and sea: Courtney Schafer, my amazing guest today, masters it all. She's an electrical engineer working for an aerospace company, a mountaineer and a scuba diver. Oh, and of course, the author of The Whitefire Crossing, the first book in the Shatter Sigil trilogy, a fantasy novel that beautifully blends all of Courtney's passions. I'm so excited to be talking with Courtney today!

EEG: You are an electrical engineer, your husband is a scientist, and you both love speculative fiction. How do the two -- science and speculative fiction -- mingle in your everyday life?

CS: Interestingly enough, though my husband’s background is in atmospheric science and mine is in electrical engineering, these days we both do the same sort of thing at work: signal and image processing algorithm development. Thankfully we are saved from discussing algorithms over the dinner table by the fact we work for different companies and so can’t discuss proprietary information with each other! But since we both love SF and science, we have plenty of conversations about subjects like the future of the space program, whether FTL or wormhole travel is more plausible from a scientific standpoint, that kind of thing. My husband is a slow reader and spends most of his reading time on technical articles, so we don’t discuss SF books much, but we both love watching SF movies and TV series (and mocking the science when they get it wrong). Farscape, Firefly, Fringe, Supernatural, Carnivale, Invisible Man, Nowhere Man, and many others grace our DVD shelves and have been watched many times over.

EEG: Your debut novel, The Whitefire Crossing, features amazing mountaineering, powerful magic, adventure, and complex characters that completely engage you into the story. Your love for mountaineering (you are a rock climber yourself, not to mention skier, scuba diver, etc.!) clearly transpires in your fantastic descriptions. But what about science? Do you find that your scientific background helped you develop the Shattered Sigil world, and if so, how?

CS: I find that working out the plot of a novel feels extremely similar to solving a complex algorithm problem. There’s the same mix of logical reasoning flavored with sudden sparks of inspiration. So in that sense, the analysis skills I’ve developed in years of working as an engineer have been quite helpful in writing the Shattered Sigil series! Heh, and as for direct influence… one day I was describing to a co-worker how my blood mages cast spells. He said, “So… they basically lay out giant circuit diagrams on the floor and channel power through them.” Me: “…OMG you are so right!” I guess I love electrical engineering too much to leave it out of my fantasy world entirely.

EEG: Ohhhh... High Five! I feel exactly the same about plotting. It's like solving a mathematical system with a lot of variables. You start with the equations (the premise and the characters), then you sit down and solve it. Okay, sorry, let's get back to the interview.

The Tainted City, the sequel to The Whitefire Crossing, is due in late 2012. Can you give us a little sneak peek on what's coming up next for Dev and the city of Ninavel? Are you working on some other projects? (Given how you make mountains come alive, I'm wondering what would happen if you'd turn to the ocean and use your scuba-diving experience for a completely new world...)

CS: I just turned over a bunch of scenes and a synopsis for The Tainted City to the cover artist (the amazingly talented David Palumbo, who also did the art for The Whitefire Crossing). It’s very exciting to see the book start the publication process (even though I haven’t yet finished writing it!). Tentative publication date is October 2012. As for a sneak peek…Dev gets out of the predicament he’s in at the end of The Whitefire Crossing, though not in the way he wanted. He and Kiran return to Ninavel – not entirely of their own choice – and find the city a far more dangerous place than they’d feared. Someone’s murdering mages in ways that mimic the most spine-chilling tales of demons, Tainted children like Dev’s young friend Melly are vanishing without a trace, and Kiran’s former master Ruslan intends to seize the opportunity to reclaim his wayward apprentice and revenge himself on Dev. Even though much of the action takes place in Ninavel, Dev’s climbing and wilderness skills still come into play, as do the Whitefire Mountains.

Right now I’m spending every spare second on finishing The Tainted City (it’s tough to find time to write between my day job and parenting my 2 year old!), but I have a few ideas for other projects. Ha, funny you should mention the scuba-diving – I confess I absolutely love stories set in ocean worlds. (Seriously, I even liked Waterworld. Well, the first part of it, at least, before Dennis Hopper started chewing scenery.) So who knows, perhaps I’ll write an ocean-based fantasy one day.

EEG: And when you do, I'll be the first one to buy it!

Courtney, thanks so much for sharing your love for the mountains, the sea, science and writing with us. To find out more about Courtney and her books, please visit her website. She also blogs at the Night Bazaar, a group blog where authors published by Night Shade Books share tips on writing and getting published.

Sunday, November 27, 2011

More HDRs

All attempts to balance the backlight, which can be pretty strong around here. They all have some degree of ghosting problems, either because I didn't have a tripod (first two) or because the clouds were moving too fast (last one). First and second pictures are looking down to Pueblo Canyon, and the third is the Rio Grande. Fourth is the cougar in town! (Jemez Mountains in the background.)

Saturday, November 26, 2011

I know that face! Sort of...

In graduate school I had a Chinese friend who one day asked me the name of the fellow student who'd just stopped by to borrow a book. I told her, she thanked me, and added, "It's so hard for me to remember faces. You guys look all alike to me."

Now, you have to understand that I'm petite, brunette with dark eyes (very Italian), and the girl she'd just asked about was the typical Northern European type, tall, blond, and blue eyes. The concept was truly intriguing. I tend to mix up Eastern Asians, but that day I learned that Asians tend to mix up Caucasians.

You may have noticed this in other contexts, for example when people tell you who you or your child looks like and they come up with the funniest things. However, the "other-race" effect (less accurate recognition of people of a different race than self) is real and has been documented in the literature [1,2]. In these studies, participants were presented faces from different ethnic groups, including their own. In a second phase, a mix of already observed and never-seen before faces was presented, and participants had to recognize which they had already seen. In [1], researchers measured different brain potentials (through EEG) and inferred a pattern between the potential intensities with the act of remembering a face:
"Individuation may tend to be uniformly high for same-race faces but lower and less reliable for other-race faces. Individuation may also be more readily applied for other-race faces that appear less stereotypical. These electrophysiological measures thus provide novel evidence that poorer memory for other-race faces stems from encoding that is inadequate because it fails to emphasize individuating information."
An event-related potential (or ERP) is a brain response to a stimulus (the faces, in this particular case). They are measured through EEG and they have several components as shown in the figure below (P1, N1, P2, N2, and P2):

In [1] researchers found interesting patterns between two components in particular, P2 and N200, and the ability to recognize a face:
"Among all the potentials examined, frontocentral N200 potentials and occipitotemporal P2 potentials were particularly informative because they yielded other-race-specific memory findings. We thus propose that these potentials indexed face individuation that tended to be uniformly high for same-race faces but lower and more variable for other-race faces."
Even though I don't have the expertise to understand all the technicalities presented in the paper (but I do welcome comments, if anybody out there wants to provide more insight), I find these results quite intriguing. For example, participants were also asked to rate the "racial typicality" of each face, and other-race faces that were "less typical" seemed to be easier to recognize. Also, the amount of exposure of an individual to the other-race group affects the results, as the brain can indeed train itself to recognition.

A curious trivia is that both studies state at the beginning that all participants were right-handed. Is there a reason for this? Does being left-handed introduce a bias in this kind of studies?

[1] Lucas, H., Chiao, J., & Paller, K. (2011). Why Some Faces won't be Remembered: Brain Potentials Illuminate Successful Versus Unsuccessful Encoding for Same-Race and Other-Race Faces Frontiers in Human Neuroscience, 5 DOI: 10.3389/fnhum.2011.00020

[2] Herzmann, G., Willenbockel, V., Tanaka, J., & Curran, T. (2011). The neural correlates of memory encoding and recognition for own-race and other-race faces Neuropsychologia, 49 (11), 3103-3115 DOI: 10.1016/j.neuropsychologia.2011.07.019

Thursday, November 24, 2011

Happy Thanksgiving!!

Amazing pumpkin carving by artist Ray Villafane.

I know many of you don't live in the US and therefore don't celebrate Thanksgiving (or celebrate it at a different time). Still, I'd like to take the chance to thank all of my featured authors for taking the time to come over and chat with me about their books and science writing. And, most importantly, I'd like to thank each and every one of you for reading, following, tweeting, commenting, liking on Facebook, sending feedback, and actively participating to this blog. In four months I've had 9,500 pageviews, and about 500 unique weekly visits from over 50 different countries!

(Even if you don't celebrate Thanksgiving.)

Tuesday, November 22, 2011

Don't forget the editor: the fundamental role of RNA editing

The genome is a plastic thing. Yes, that's right: the genome is plastic. No, it's not true that an individual's DNA doesn't change. It's not true that genes dictate what we are, and it's not true that DNA is just a set of instructions. And that we can "build" an organism by simply giving a string of As, Gs, Ts, and Cs.


There's so much more to genomes than nucleotides and genes. If you've been following me from my very first post back in July, I hope the message has come through now. Epigenetic changes can alter the way genes are expressed, and some of these changes are passed to the following generations. RNA can act as a gene "silencer," "turning genes off" as a response to a numerous number of environmental stimuli. Jumping genes can move around the genome, increase its size, cause insertions and deletions, and most of these changes are somatic, in other words, they are not "coded" in the DNA. Genes interact together and act as an orchestra rather than push-buttons. Sense and antisense genes can compete for expression. RNA sequences can be altered through RNA editing, which can happen at different levels and result in different protein functionality.

Bottom line: who we are is the result of a very complex and intricate network of different mechanisms inter-playing together. You can't just pluck one out and say, "A-ha! This is it!" as much as you can't play Beethoven without the whole orchestra.

The idea that genes were the equivalent of proteins (I don't mean literally, but conceptually equivalent, in the sense that proteins are viewed as the direct product of genes) has prevailed for many years, until we learned the true importance of RNA. Its role goes beyond that of a mere intermediate between DNA and proteins, and new studies have brought to life a new aspect of RNA, which is that of regulatory agent. Bits of microRNA can bind to their complementary strands, effectively silencing a gene. Or, it can be modified through enzymes and change the functionality of the proteins it codes -- a mechanism called RNA editing. All together these processes confer a plasticity to the way RNA, DNA and proteins interact together which allows adaptation to the environment, and it's especially active in the brain.

As John Mattick explains in [1],
"The ability to edit RNA, much of which occurs in noncoding sequences, suggests that not only proteins but also – and perhaps more importantly – regulatory sequences can be modulated in response to external signals and that this information may feedback via RNA-directed chromatin modifications into epigenetic memory."
Genes interact with the environment at two levels: short-term responses can alter gene expression; but also more stable phenotypic changes can occur in reaction to environmental stimuli, which affect underlying epigenetic processes.
"RNA sequences can also be altered by RNA editing, which suggests an evolved ability to overwrite hard-wired genetic information, thereby providing the molecular basis for plasticity in the system."
Interestingly, one of the ways RNA gets edited is through the use of APOBEC, a family of enzymes the existence of which I learned from HIV. Every now and then you come across a sample of HIV sequences (from a single patient and single point in time) where you see an extensive number of G to A mutations. That is how the APOBEC3G enzyme re-edits the viral HIV and by doing so it can impair the virus life cycle [2]. Little I knew, APOBEC enzymes edit human RNA too. Mattick lists numerous examples where the activity of the APOBEC enzymes has been found, and even though he notices their importance as defenses against retroviral infections (as in the case of HIV), he also formulates an interesting hypothesis:
"An alternative and exciting possibility is that these enzymes have evolved and expanded not (simply) to defend against the movement and activity of endogenous retroviruses (ERVs) and retrotransposons, but to regulate evolved functions associated with the domestication of such sequences as agents of epigenetic regulation and somatic plasticity, especially in mammals and primates."
Basically, what Mattick is saying is that rather than just "destroying" the viral sequences, these enzymes may have played a role in integrating them in our DNA and "reusing" them, as for example in the case of endogenous viral sequences expressed in the placenta.

The more I read about these things the more I realize how much I don't know. The genome keeps surprising me with its amazing plasticity and adaptability. DNA is far more than a code. It's life, and there's no life without complexity and change.

[1] Mattick JS (2010). RNA as the substrate for epigenome-environment interactions: RNA guidance of epigenetic processes and the expansion of RNA editing in animals underpins development, phenotypic plasticity, learning, and cognition. BioEssays : news and reviews in molecular, cellular and developmental biology, 32 (7), 548-52 PMID: 20544741

[2] Chiu, Y., Soros, V., Kreisberg, J., Stopak, K., Yonemoto, W., & Greene, W. (2005). Cellular APOBEC3G restricts HIV-1 infection in resting CD4+ T cells Nature, 435 (7038), 108-114 DOI: 10.1038/nature03493

Monday, November 21, 2011

Author Mark Lawrence talks about artificial intelligence, publishing, and his debut novel, the Prince of Thorns

Mark Lawrence's debut novel, the Prince of Thorns, the first in a dark fantasy trilogy, came out last August, and it has already garnered raving reviews: Neal Asher defined it "The best fantasy read I’ve had since Alan Campbell’s Scar Night," and Publisher's Weekly called it "morbidly gripping." What you might not know about Mark is that his day job is as a research scientist "focused on various rather intractable problems in the field of artificial intelligence" (quoting from Mark's bio).

It is my great pleasure to have Mark Lawrence here as a guest today!

EEG: I confess I don't know much about Artificial Intelligence: can you tell me a little bit about your research?

ML: Artificial Intelligence is more a media expression, an umbrella term to cover a multitude of activities that sound far less interesting and take much longer to communicate. A "basic" building block used in many of these activities is Bayesian inference used to move from raw numbers toward reasoning. I have worked on a lot of image processing problems, tracking, detecting, classifying, sometimes to guide autonomous robots to one end or another. I have also worked on a lot of data fusion problems, bringing together information from different sensors and sources to achieve various goals. My claim to rocket science is tenuous and based on collaborating with NASA scientists to employ the constraints of orbital dynamics in tracking problems in space. Any readers still awake at this point are to be congratulated!

EEG: Hehe, Chimeras readers are trained to science jargon, right folks? How much of your research (or your scientific background, if you will) influences your writing, and how much of your writing on the other hand influences your research?

ML: Heh. I'd say none of my research influences my writing, possibly my general scientific knowledge creeps in on rare occasions. And I guess none of my writing influences my research. Dull, but there it is! Certainly an active imagination is a great help to a research scientist. Many good ideas come from pursuing unusual paths ‚ but my actual writing is about character, not method.

EEG: Provocative question: what's harder, to publish research or to publish fiction?

ML: I found it rather easy to get my fiction published‚ at least to get a book published. Short stories were a harder sell for me. I don't think my experience is typical, though. To get a scientific paper published varies in difficulty depending on where you want to place it. Many conferences will accept almost anything. Even good conferences are not hugely picky -- they want your attendance fee. To get a paper into a good technical journal (an IEEE publication say) requires more effort. You need to do the good science (the bulk of the work) then shape it to fit the language and direction of the publication, and sadly it helps if you've networked at conferences and technical meetings.

In short, both are difficult. I found fiction easier to publish but I suspect I got lucky. In general fiction is harder to publish. Ironically I think I'm probably a better scientist than I am a writer, and yet it's taken me more effort to get published in technical journals, which whilst difficult is easier than getting a work of fiction published in hardback.

EEG: Some luck is required in just about everything, but I'd say a strong, compelling first-person narrative as you master in your book helps a great deal! Indeed, the Prince of Thorns, which came out last August, is already been translated in ten languages, and it's one of the ten finalists for the 2011 Goodreads Choice Awards in the category Fantasy. That's amazing, congratulations! In your website, you say the book is about the main character. Can you tell us what idea or concept inspired the story?

ML: The character was inspired by Anthony Burgess's A Clockwork Orange. I wanted to experiment with having an amoral but charismatic young man as the protagonist. There the two works diverge both in setting and intent. Burgess is critiquing society and doesn't explore his character's origins to any great extent. In Prince of Thorns the protagonist reveals a lot about his past and whilst he never offers any of it up as an excuse, if you read between the lines there's enough to make you think about the issues of nature vs nurture, the nature of evil, and how the possibilities before us as children get taken away.

That's fascinating. We all love dark, shady characters! Thanks for sharing this with us today. To find out more about Mark's book and his upcoming sequel, visit him at

Saturday, November 19, 2011

Of hierarchies, mice, and neurons

It's shared across very different species, from ants and bees all the way up to chimpanzees and humans: social hierarchy dictates the structure of a group, and the ability to correctly recognize an individual's status, as well as their own, is crucial to successful interactions in the group.

Interestingly, social cognition is distinct from social status recognition, as demonstrated by studies on humans with brain lesions [1]. Neuroimaging also revealed that social status recognition has its own distinct network of brain regions, which includes the inferior parietal lobe (IPL), dorsolateral and ventrolateral prefrontal cortices (DLPFC and VLPFC), and portions of occipitotemporal lobe (OG). Social status is recognized through a range of nonverbal clues. For example, primates and humans are sensitive to facial expressions (such as direct eye contact) and body postures that make an individual "look" larger or more imposing.

These cues are processed through the DLPFC and VLPFC regions, which are usually associated with socioemotional responses and behavioral inhibition. They can overrule automatic responses in situations where the dominant individual imposes compliance to social norms.

As Chiao concludes in [1]:
"Given the ubiquitous presence of social hierarchy across species and cultures, an outstanding question in social neuroscience is to understand how adaptive mechanisms in the mind and brain support the production and maintenance of social hierarchy. Recent social neuroscience studies show that distinct neural systems are involved in the recognition and experience of social hierarchy, and that activity within these brain regions are modulated by individual and cultural factors."

A recent study published in Science [2] found a correlation between synaptic strength (the signals between neurons) and social rank. The researchers used a mouse model to investigate potential differences in the synaptic properties in the medial PFC region (which is the homologue equivalent of the human dorsolateral and medial PFC regions) between dominant and subordinate mice. They used the test tube to rank the social hierarchy among cage groups of 4 mice each: the tube only lets one mouse through and the challenge is to push the opponent out of the tube.

Researchers found that dominant mice have larger synaptic strength than the subordinate ones. Neurons transmit signals through chemicals called neurotransmitters, which are stored in vesicles and released at the synapse (the structure that transfers chemical signals between neighboring neurons). The strength of a signal can be measured in terms of "quantal release," which basically measures the number of effective vesicles released in response to an impulse. Wang et al. detected a higher quantal release in dominant mice. Furthermore, they proved that the opposite is also true: lowering the strength of these signals caused mice to lower in social rank.

In order to prove this, they manipulated the synaptic transmission mediated by a receptor called AMPA. They delivered DNA to the mouse brain with a viral vector that preferentially infects pyramidal neurons. Using this mechanism, Wang et al. were able to either amplify or deplete the amplitudes of AMPA-mediated synaptic currents, and when they did so they noticed that mice with stronger synaptic signals moved up in the social hierarchy, whereas the ones with lower signals moved downwards in ranking.

In the Perspective review accompanying the paper [3], Maroteaux and Mameli conclude:
"Wang et al. provide two conceptual advances: the idea that a neurobiological substrate for social ranking is located in the mPFC, and that synaptic efficacy represents a cellular substrate determining social status. Although the mPFC has an established role in social behavior, it cannot be considered the only structure where dominance is encoded. Future studies will be necessary to determine the hierarchical organization among brain structures underlying this complex behavior."

[1] Chiao, J. (2010). Neural basis of social status hierarchy across species Current Opinion in Neurobiology, 20 (6), 803-809 DOI: 10.1016/j.conb.2010.08.006

[2] Wang, F., Zhu, J., Zhu, H., Zhang, Q., Lin, Z., & Hu, H. (2011). Bidirectional Control of Social Hierarchy by Synaptic Efficacy in Medial Prefrontal Cortex Science, 334 (6056), 693-697 DOI: 10.1126/science.1209951

[3] Maroteaux, M., & Mameli, M. (2011). Synaptic Switch and Social Status Science, 334 (6056), 608-609 DOI: 10.1126/science.1214713

Thursday, November 17, 2011

The immortality paradox

My friend Tim Bowen posed a really interesting question. Tim is a retired Los Angeles Police Officer, a writer, and a fantastic story teller. If you don't believe me, check out his book (Kindle edition available from Amazon), a collection of stories from when he was an LAPD street cop. A forewarning, though: don't read it in public places unless you don't mind people staring at you. Before you know it, you'll burst out laughing and everybody will be wondering what you've been adding to your breakfast cereal.

So, here's Tim's question:

Is it true that our cells die and are replaced every 7 days? Now it is my understanding that as we age the memory of the cells is that of the previous cell’s age. Can we turn off that memory and allow the cell to be a youthful one that replaces our older one or to replace them in such a way as not to age or at least not as rapidly?

Every cell in our body undergoes a certain number of replications before it dies. In children, cells replicate ten, maybe twenty times. By the time we reach our senior years, cells replicate once or twice and then die.

The "memory" Tim's talking about is the telomere, a non-coding part of the DNA that sits at the end of our chromosomes. Every time cells duplicate, the telomeres shorten: they lose about 100 base pairs with every cell division, until they reach a point that "signals" it's time for the cell to die. This mechanism prevents cells from replicating too many times, as each replication carries a certain risk of damaging the DNA. Telomeres shorten as we age, hence our cells undergo less replication cycles. What keeps us young is the ability of cells to regenerate.

Now, here's the interesting bit. There's an enzyme, called telomerase, which allows for the replacement of the telomeres. Not all our cells have this enzyme. It is expressed where it's most needed: in embryonic cells, because those cells need to divide many times in order to form a new person; in the immune system; in tissues that undergo periodic renewal.

Of course, the concept is intriguing. What if we could use this amazing enzyme to rejuvenate our cells? That's what researchers from the Dana-Farber Cancer Institute did [1]: they engineered telomerase-deficient mice by knocking out the TERT gene, which codes the telomerase enzyme. These mice, inbred through several generations, showed considerable damage to several organs, tissue atrophy, and half the life span of normal mice. The researchers then devised a clever way of reactivating the enzyme by activating TERT transcription only in the presence of a molecule called 4-OHT. In the presence of 4-OHT in vitro cell cultures showed that the telomerese ends lengthened and cell proliferation resumed. Furthermore, after a 4-week treatment with the 4-OHT molecule, the degenerative damage induced by the lack of telomerase in the the knock-out mice was considerably reversed and their life span lengthened:
"Telomerase reactivation in such late generation TERT-ER mice extends telomeres, reduces DNA damage signalling and associated cellular checkpoint responses, allows resumption of proliferation in quiescent cultures, and eliminates degenerative phenotypes across multiple organs including testes, spleens and intestines."
Quite remarkable. So, is this the holy grail of anti-aging techniques?
Well, there's a catch in all this. It's called cancer.

A cancer cell is a cell that replicates abnormally. In 1951 George Otto Gey took a few cancer cells from his patient Henrietta Lacks and propagated them in vitro. That cell line, called HeLa cells, is still alive today (I'm sure you've all heard or read Rebecca Skloot's wonderful book The Immortal Life of Henrietta Lacks). Henrietta's cells, placed on a feeding substrate, continue to replicate. If you did the same experiment with healthy cells, the cell line would eventually die because of aging. But the HeLa cells don't. They don't age. Why? You've guessed it. The telomere ends never shorten and there's no signal for the cell to die.

Bottom line: a cell that never dies is a cancerous cell. That's the immortality paradox.

It reminds me of Jorge Luis Borges's story, The Immortal. I was in high school when I read it the first time, and I clearly remember that until then it had never occurred to me that immortality could be such a sad state of mind as in Borges's story. All fantasy stories I had read portrayed immortality as a god-like quality. I think Borges was onto something.

And on this sad note, I'm going back to Tim's book to cheer myself up.

Photo: pink clouds after the storm. Canon 40D, focal length 17mm, exposure time 1/20.

[1] Jaskelioff, M., Muller, F., Paik, J., Thomas, E., Jiang, S., Adams, A., Sahin, E., Kost-Alimova, M., Protopopov, A., CadiƱanos, J., Horner, J., Maratos-Flier, E., & DePinho, R. (2010). Telomerase reactivation reverses tissue degeneration in aged telomerase-deficient mice Nature, 469 (7328), 102-106 DOI: 10.1038/nature09603

Tuesday, November 15, 2011

The case of "junk DNA" and why it shouldn't be called junk: RNA.

This is part 5 of 5 in a series dedicated to the concept of "junk DNA". Links to the previous parts: Part 1, Part 2 (redundancy), Part 3 (epigenetics), and Part 4 (topology).

I recently discovered the work of John S. Mattick (who's written many beautiful reviews on RNA) and learned a new concept, which he discusses in [1]: while the number of protein-coding genes is relatively constant across complex species, non-coding DNA increases with developmental complexity.

Isn't it intriguing? You see, when it comes to DNA people tend to think that everything revolves around genes. They are the bits of DNA that get transcripted into coding RNA from which proteins are made. However, as I have stated in my previous "junk DNA" posts, most of our DNA is non-coding -- it doesn't yield proteins. Well, it turns out, RNA transcripts from non-coding DNA are highly expressed during embryogenesis and in the brain, and they are involved in regulating epigenetic processes. They don't code proteins but they do have a function, and in fact, that's why they are called non-coding functional RNAs (which almost sounds like an oxymoron).

In [2], Mattick and colleagues list examples of non-coding RNA that was later identified to encode a functional protein (in a different context), and they hypothesize that this may be the case for many more non-coding RNA regions, as they may be "translated in very specific contexts or at very low levels." The opposite may be true for coding RNA, in other words, it could be that coding RNA also holds non-coding regulatory functions in other contexts. They conclude with an interesting analogy with cell phones, which were originally created to fulfill the need to communicate in the absence of landline and then gradually evolved into calculators, internet browsers, cameras, and media players. Similarly, they hypothesize that RNA has gradually acquired many numerous functions over the course of evolution, building a very complex platform for genetic innovation.

All this suggests that DNA alone is only one part of the picture, and together with DNA, we should be sequencing RNA as well to see whether or not putative mutations are indeed expressed. In fact, this has been done in a recent study in plants [doi:10.1038/nature10414], paving the way for future human studies as well. (Sequencing both RNA and DNA was indeed discussed in this wonderful post from Genomes Unzipped, together with the challenges that sequencing and aligning both DNA and RNA poses.)

[1] Mattick JS (2011). The double life of RNA. Biochimie, 93 (11) PMID: 21963144

[2] Dinger ME, Gascoigne DK, & Mattick JS (2011). The evolution of RNAs with multiple functions. Biochimie, 93 (11), 2013-8 PMID: 21802485

Saturday, November 12, 2011

An addendum on Haldane's dilemma and the use of mathematical models

Last week, my post on Haldane's dilemma garnered many views. I'm glad people are reading it and I hope they find it useful in clarifying the great impact of Haldane's 1957 paper. For those of you interested in digging deeper into the topic, the Panda's Thumb discusses the matter in a 2007 post, and Gene Expression covers it here.

I just have an additional note, which is a bit of a pet peeve of mine, but as I read about the reactions to Haldane's paper scattered all over the Internet, I realized that people tend to say things like "Haldane was wrong," or, "Haldane was right, and such and such are wrong."

Let's get this straight: Haldane formulated a mathematical model. His work set the foundations for the mathematical theory of population genetics. The usefulness of mathematical models is bi-fold: they either fit the data or they don't, and in either case they are informative. Let me explain better.

You can break down scientific thinking in the following points:
  • Hypothesis.
  • Assumptions.
  • Model.
  • Conclusions.
There's usually one or more hypotheses you want to test. You come up with a set of assumptions you need to make. You design a model, you test it, you reach your conclusions. Once you have it, you use the model in a comparative way: if it correctly represents the data, then the assumptions of the model are met. If it doesn't, then you go back and see which of your assumptions have failed in the dataset.

Back to Haldane. He formulated a question: how many generations do I need in order for a minor allele under selection pressure to get fixed? He made certain assumptions (infinite population size, constant selection pressure, etc.), designed a model, came to a conclusion. Now here's the power of the mathematical model: if we find an incongruity between the observed data and the model, then we know where to look for the fallacy. In the assumptions. Today we know that most mutations arise under completely neutral conditions. Haldane wasn't wrong. He just formulated a model. A powerful one, one that nobody had thought of before him. One that later inspired Kimura's neutral theory and that made us understand evolution better because we realized that not all alleles are under selection pressure.  

Looking in my own backyard (I don't mean to promote my own work, but this is an example I can easily explain), in 2008 we published a mathematical model of viral evolution in early HIV-1 infections [2]. Our particular question was: how many genetically distinct viruses enter the host in any given sexually transmitted infection? And then, given that the immune system takes some time to mount its defense against the viral infection, we also asked, how early does selection pressure from the immune system kick in? In order to answer these questions, we designed a model that made several assumptions, including: (i) one virus only initiates the infection; (ii) the viral population grows under no selection. This second assumption raises many eyebrows when I present the model. The typical objection I hear is: "How can you be sure there's no selection?" Well, I'm not. But that's why I have the model.

Our samples (sequences of viral DNA from plasma) come from patients that have acquired the virus only a few weeks earlier. If not much time has passed since the start of the infection, there won't be any selection pressure on the virus because the host's immune system hasn't "prepared" its response yet. However, occasionally we will get a sample that does show the first evidence of selection pressure. How do we prove there's selection? We take that particular dataset and see that the model doesn't fit. By using our model in "reverse" (so to speak) we were able to observe that the host's selection pressure in HIV-1 infections starts earlier than previously thought.

Bottom line: mathematical models are used not only to describe the data, but also to prove or disprove whether or not certain assumptions are justified. And knowing which assumptions failed is just as informative as knowing that the model fits the data well. Yes, it is a subtlety, but it's an important one, because if you listen carefully to those who raise pseudo-scientific arguments against evolution, you'll see that the main point they are missing is exactly what I tried to illustrate above: the scientific use of a model.

[1] Haldane, J. (1957). The cost of natural selection Journal of Genetics, 55 (3), 511-524 DOI: 10.1007/BF02984069

[2] Keele BF, Giorgi EE, Salazar-Gonzalez JF, Decker JM, Pham KT, Salazar MG, Sun C, Grayson T, Wang S, Li H, Wei X, Jiang C, Kirchherr JL, Gao F, Anderson JA, Ping LH, Swanstrom R, Tomaras GD, Blattner WA, Goepfert PA, Kilby JM, Saag MS, Delwart EL, Busch MP, Cohen MS, Montefiori DC, Haynes BF, Gaschen B, Athreya GS, Lee HY, Wood N, Seoighe C, Perelson AS, Bhattacharya T, Korber BT, Hahn BH, & Shaw GM (2008). Identification and characterization of transmitted and early founder virus envelopes in primary HIV-1 infection. Proceedings of the National Academy of Sciences of the United States of America, 105 (21), 7552-7 PMID: 18490657

Thursday, November 10, 2011

What shall we play today? How about a protein folding game?

So, after hearing all the talking about it, I finally did it. I checked out Foldit, the online scientific discovery game. I'm sure you've all heard of it, and many of you may have even played with it -- if so, please share your experience in the comments because I'd love to hear about it! Developed by researchers at the University of Washington, the beta version of Foldit was released in 2008. Players compete to solve protein structures that would otherwise require an enormous amount of computing time and resources.

THE BIOLOGY PROBLEM. Proteins are formed through the translation of RNA into a sequence of amino acids. As such, the protein is nothing more than a string of 20 letters (each letter represents one amino acid), just like DNA is a string of 4 letters. However, once formed, proteins coil and fold up into a 3-D structure that not only varies from protein to protein, but it also determines the protein's ability to bond (or not) to other molecules, thus affecting the way it performs its function. Solving the 3-D structure of a protein is a very complicated business, mostly because it's a problem with an enormous amount of variables: the folding structure is determined by the way the side chains of the amino acids interact with one another, by their electric charges and energy potential, and other factors such as temperature, pH, the presence of molecules able to "aid" the folding process, the formation of hydrogen bonds, etc*. Some proteins are essentially rigid, others undergo "conformational" changes as they perform their functions.

Because much of a protein's functionality depends on its shape, we can't understand how it works unless we completely understand its folding structure. The crystal structure of gp120 for example, the protein HIV uses for viral entry into target cells, was completed in 1998 [1] and this allowed us to understand how the virus docks with T-cells and its ability to dodge immune responses. This protein has regions that are highly variable (mutations and changes in the DNA arise constantly), enabling the virus to "hide" from the host's immune system. However, the bit of the protein that binds to the CD4 receptor (to initiate viral entry), is highly conserved and well hidden between the coils and folds of the protein itself. When the virus nears the target cell, gp120 undergoes a "conformational change": the docking bit "comes out of its hiding spot" and binds to the receptor. In other words, it changes its shape so it can "fit" into the receptor and initiate the infection.

THE COMPUTATIONAL PROBLEM. Mathematically, in order to solve a 3D structure, you have to give a set of spatial coordinates (x,y,z) -- the "unknowns." The number of unknowns in the system is the "degrees of freedom," and in order to solve the system, you need one equation per unknown. A complete protein structure is encoded in a file like this one. Jargon aside, notice the lines that start with "ATOM": what you have there are the (x,y,z) coordinates (in Angstroms) of each atom. Now, human proteins are quite big: hundreds, sometimes thousands, of amino acids -- go figure how many atoms! On top of that, you have to consider all the additional variables I mentioned above -- pH, temperature factor, chemistry, energy potential, etc. You can see how the number of degrees of freedom and additional constraints can easily escalate to astronomical. Even the fastest computer will take a very long time to generate all the information needed to uniquely determine the structure of the protein.

To optimize the run time of computationally intensive programs, the problem can be broken down in several parallel jobs that run across different machines, a process called distributed computing. That's the principle behind Rosetta@home, a program that anybody can download, and that will use the idle time on the volunteer's computer to run portions of the huge protein folding algorithm.

But what about the game? As you can imagine, folding a protein requires a lot of machine learning and CPU time, whereas human brains have a natural knack for complicated visual tasks like spotting patterns. As I checked out the basic rules of Foldit, I realized that they sounded simple enough for a human brain to understand, but are extremely complicated to program in a machine: (a) you need to make sure that your protein is packed in its structure, with no empty spaces in between; (b) proteins have a hydrophobic part, which stays away from water, and a hydrophilic part, which instead can touch water molecules. Since proteins move in water all the time, you need to make sure that the hydrophobic part is packed inside the protein and that the hydrophilic part surrounds it completely; (c) respect the space constraints, in other words, no two atoms can be in the same position at the same time.
Things like minimizing the energy gradient are built-in tools.

THE GOAL. For now the main goal of Foldit is to see whether humans are better than machines at "guessing" protein structures. If this turns out to be the case, the next goal will be to have gamers predict unknown protein structures and also create new synthetic proteins for drug design. In June 2009 Foldit introduced a new feature called "recipes," which basically are algorithms and strategies that players implement themselves and, if they choose to, they can save them and share them. In a recent PNAS paper [2], Khatib et al. analyzed the recipes uploaded by Foldit players, their use, and their success. While no single recipe allowed for the achievement of a structure without human intervention, players strategized the use of many different ones in different parts of the game to solve specific within-game tasks.

This introduces a new aspect of the game, which I found quite intriguing -- social interactions: apparently the use of recipes across the population of players spreads mostly by word of mouth, and successful recipes are implemented, copied, and varied constantly by multiple players as their use increases. (The social evolution of these algorithms could probably constitute a quite interesting research study on its own.) One of the most popular recipes, Blue Fuse, was found to be strikingly similar to the Fast Relax algorithm developed by Foldit scientists, indicating that the social evolution of these algorithms can lead to independent discovery of optimal strategies. As the authors conclude,
"Benchmark calculations show that the new algorithm independently discovered by scientists and by Foldit players outperforms previously published methods. Thus, online scientific game frameworks have the potential not only to solve hard scientific problems, but also to discover and formalize effective new strategies and algorithms."
Does anybody need further evidence to prove that cooperation paves the way to success?

*There are also post-translational modifications called glycosylation and phosphrylation, enzymatic processes that attach additional molecules to the protein.

Edited to include a wonderful comment from Antisocialbutterflie, who works on protein crystallography:
Since you like photography I'll sum it up in that metaphor. Crystallography basically acts in the same way as a camera. In a camera the light reflects off an object and the image is scattered (or in some diagrams shown flipped upside down) by the first lens. The wavelength of visible light is too long to see things on an atomic scale (100s of nm versus seeing things on the angstrom level). We replace the light with x-rays and the crystal acts as both the object we are seeing and the first lens.

In photography there is a second lens that recollects the scattered light to reconstitute the image and makes it look bigger depending on the focal length. We don't have one of those for X-rays so that's where the math comes in. Each point of scatter indicates a slice of space in 3D and how intense it is tells us how many electrons there are in that spot. The math gets a little crazy and there is a whole aspect called the phase problem (which is one of the things that the program addresses) that we'll skip over, but for the purposes of this explanation we've solved it and what you get isn't positions of atoms per se but a cloud of electron density. We have to take that cloud and build where we think the atoms are, taking into account what we understand about chemistry. The higher the resolution the less ambiguous the solution is, but most high impact papers are low resolution so the "structure" is really a best guess model from the information we have. Wow that got really long.

FoldIt is a best case scenario where the algorithm is the human brain and all of its spacial reasoning power. The current means of predicting protein structure is sketchy at best. The FoldIt guys have one of the best ones right now but applying the stuff they learned from the study is like trying to teach a robot how to describe the color blue. It's still a long way off. There are methods in existence to try and detect how a protein would change in the face of a particularly mutation but frankly the very nature of protein folding is one of the big existential questions that isn't going to be answered any time soon.

I'll admit that the game was pretty cool though I think it would be cooler for someone who isn't staring at the exact same thing on a daily basis.

[1] Kwong PD, Wyatt R, Robinson J, Sweet RW, Sodroski J, & Hendrickson WA (1998). Structure of an HIV gp120 envelope glycoprotein in complex with the CD4 receptor and a neutralizing human antibody. Nature, 393 (6686), 648-59 PMID: 9641677

[2] Khatib, F., Cooper, S., Tyka, M., Xu, K., Makedon, I., Popovic, Z., Baker, D., & Players, F. (2011). Algorithm discovery by protein folding game players Proceedings of the National Academy of Sciences DOI: 10.1073/pnas.1115898108

Tuesday, November 8, 2011

Gene inactivation and the female immune system

Genes don't usually disappear from a genome. However, a mutation that affects the transcription of the gene can induce its loss of function. For example, a mutation could introduce an early stop codon -- a bit of DNA that interrupts transcription. The result is a truncated RNA that can't make a functional protein. Such a mutation inactivates the gene because it can no longer produce the protein it was coding. Gene inactivation is a mechanism that has shaped evolution by allowing new genes to replace the old ones (which remain in the genome, only they become part of the non-coding DNA).

In humans, an example is given by CMAH, which is a gene in most mammals, but has turned into a pseudogene (which means it lost its functionality) in humans due to a mutation that's estimated to have appeared approximately three million years ago. (On a side note, these historical estimates are made through algorithms that compute the evolutionary phylogenetic tree of a genetic sample of sequences, something I'll try and explain in more detail in a future post.) CMAH codes an enzyme called Neu5Gc, but it so happens that this enzyme is an antigen in humans: when detected, our immune system attacks it and destroys it.

Researchers from UCSD have investigated the evolutionary process behind the loss of functionality in CMAH [1] and found that it was driven by the female immune system.

The Neu5Gc enzyme covers the surface of the cell and it's often used by pathogens as a docking means for cell targeting. Therefore, it's been hypothesized that individuals without this enzyme experienced a selective advantage because of immunity against such pathogens. However, pathogens change quite rapidly and find other ways to attack the host, so that alone is not a feasible explanation. The UCSD researchers used a transgenic mouse model with the added human mutation in CMAH to test the hypothesis that the female immune system can attack either sperm or fetal tissue expressing the Neu5Gc enzyme. Indeed, they found that female mice with the anti-Neu5Gc antibodies showed reduced fertility when mating with Neu5Gc-positive males. Furthermore, they showed that human serum can attack chimpanzee sperm, which is rich in Neu5Gc levels.

Ghaderi et al. used these results to model the fixation of the mutated CMAH human allele and concluded that the female immune system, by attacking the Neu5Gc-positive sperm, significantly reduced fertility with Neu5Gc-positive mates. This resulted in an enhanced fertility between negative CMAH pairs and quickly drove the CMAH mutation to fixation (thus causing the gene inactivation in the whole population).

The question is: if Neu5Gc was originally present in the organism, how did the first humans develop an immunity against it? How events exactly unfolded (and when) is still a puzzle, but Ghaderi et al. suppose that first there had to be a loss of both wild-type alleles in a minority of the population. From a population genetic point of view, this is likely to happen when a group of individuals is isolated from the rest, either through migration or because of a geographical or cultural split in the population. New mutations arise and, the smaller the population size, the more likely they are to "survive" selective pressure. The individuals with the silenced CMAH gene later developed an immune response against Neu5Gc to which they were exposed possibly through a diet reach in red meat (which is rich in Neu5Gc). The loss of functionality in the gene CMAH, combined with the new immune response, triggered the mechanism described by Ghaderi et al. These findings are compatible with the fact no Neu5Gc was found on Neanderthal bones but only its equivalent, Neu5Ac.

[1] Ghaderi, D., Springer, S., Ma, F., Cohen, M., Secrest, P., Taylor, R., Varki, A., & Gagneux, P. (2011). Sexual selection by female immunity against paternal antigens can fix loss of function alleles Proceedings of the National Academy of Sciences, 108 (43), 17743-17748 DOI: 10.1073/pnas.1102302108

Monday, November 7, 2011

To HDR or not to HDR?

HDR stands for high dynamic range and it's a technique that merges three different exposures in one image. Why would you want to do that? Because sometimes the light is far from ideal. Take a cloudy day, for example. The sky will have a fantastic texture given by the white layers of clouds, but in order to capture it you have to use a short exposure. However, when you do that, your foreground will be too dark. The human eye resolves that because as you shift your eyes from one focal point to the next, your pupil contracts and relaxes, allowing more or less light to get through. Your camera can't do that. By taking two or three images of which one should be under-exposed, and one over-exposed, HDR will allow you to take the best out of each shot.

I'm still struggling to use it in a proper way, but these are a couple of examples I took this past summer:

The original shots for the first photo are here, and here are the ones for the second. As you can see, none of the original shots is ideal, but when combined together through HDR, the end result is quite pleasant. Both pictures were taken in Ogunquit, Maine. It was June, and the weather was often gloomy, so I ended up using HDR a lot. I wish I could use it better, but I'm still learning. Also, notice that none of these examples were taken with a tripod. Ideally, you really want a tripod to make sure the three images you take overlap perfectly. However, since carrying a tripod around happens to be a little inconvenient, I noticed that shooting architectural elements helps because the software has an easier time overlapping the images along straight lines. Finally, the softer colors compared to the originals are due to a post-HDR processing tool called tonemapping.

Besides tonemapping, HDR comes with a whole suite of editing options, and depending on which you pick, you might end with pretty dramatic effects, like this:

(Original shots here.) Slightly reminiscent of Armageddon. Well, the light was a little dull, and my mood reflected the spirit, so I kept it. But, just so you know, the example above is NOT good HDR, IMHO. In fact, let's not call it HDR. Let's call it a personal interpretation, or an artistic rendition of an image. What usually makes HIDR-processed images look surreal and dramatic is an exaggerated use of tonemapping, fusion, compression, and all those fancy processing options. In fact, some of those options like tonemapping are available even without HDR. All you're doing is changing the color map on the image. But a well-dosed, expertly used HDR does not make the image surreal. It should only blend the light exposure.

Unfortunately, unless you're willing to spend a fair amount of time learning the technique, the "dramatic" kind of HDR is what you'll end up doing for the most part. And that's why so many people feel strongly against HDR. They claim it's artificial and doesn't reflect reality. People who take that stand should be aware that they are not criticizing HDR. What they find unreal and artificial is the software that manipulates the image after the HDR has been done. HDR per se is a fantastic tool, and like with all tools, it takes time and dedication to learn how to properly use it.

Here's what a professional photographer can do with it. (I've been following Jeff Sullivan on Flickr and on Panoramio and his photos are absolutely stunning. I'm hoping that by staring at his amazing pictures some of his talent will eventually rub off... well, I gotta hope for something!) National Geographic also has a nice HDR gallery (and some of those example I do find a little dramatic, but I admit they are very beautiful and well done).

So the bottom line is: photography is an art, and thanks to modern technology and software there are many, many new tools out there. Each tool gives you a little something and takes away a little something, and you have to pick and choose what you want/need/etc. Document yourself and rather than nixing one completely and embracing another one completely, and keep an open mind on all because you never know what you're going to love next.

Saturday, November 5, 2011

Haldane's dilemma

Today is JBS Haldane's 119th birthday. Together with Fisher and Wright, Haldane is considered the founder of the mathematical theory of population genetics. Population genetics studies how allele frequencies (the prevalence of different copies of genes) change in populations due to processes like natural selection and genetic drift. In other words, how mutations arise and how they undergo a turnover in the population.

To celebrate Haldane's birthday, I thought I'd discuss his 1957 paper, "The Cost of Natural Selection" [1], which, unfortunately, has gained popularity after some people started using it as an argument against evolution. In this paper, Haldane states that multiple favorable traits cannot be selected at once:
"In this paper I shall try to make quantitative the fairly obvious statement that natural selection cannot occur with great intensity for a number of characters at once unless they happen to be controlled by the same genes."
Haldane poses the following question: supposing you have constant selective pressure towards one trait in particular, how many individuals without the trait need to die before the new trait takes over? Too many deaths will cause the population to go extinct, but too little will never allow the turnover of the new trait. This is what he defines the "substitution cost," or, in other words, the cost for a trait to become advantageous.

He calculates the substitution cost under a very particular scenario: suppose that a sudden change happens in the environment (like a shift in climate, or the introduction of a new predator), and this causes a certain species to be less adapted to the environment, and therefore, to have lower reproduction rate. The less fit individuals will die first, thus allowing natural selection to push forward the fitter ones. Suppose that a particular mutated gene, until then rare in the population, favors adaptation to the new environment. Gradually, the population will see a shift in prevalence of the new trait. Individuals without the trait will progressively go extinct and in this process other traits may get lost. As a result, under this scenario, no more than one gene can be selected at once.

The concepts in this paper were later referred as "Haldane's dilemma" by paleontologist Van Valen, who formulated the dilemma as "for most organisms, rapid turnover in a few genes precludes rapid turnover in the others. A corollary of this is that, if an environmental change occurs that necessitates the rather rapid replacement of several genes if a population is to survive, the population becomes extinct."

In his 1957 paper Haldane concludes:
"Unless selection is very intense, the number of deaths needed to secure the substitution, by natural selection, of one gene for another at a locus, is independent of the intensity of selection. It is often about 30 times the number of organisms in a generation. It is suggested that, in evolution, the mean time taken for each gene substitution is about 300 generations. This accords with the observed slowness of evolution."
This may indeed sound surprising. If it takes 300 generations for one trait to replace the old one, how can we possibly have achieved the kind of diversity we observe today? Two of Haldane's assumptions are problematic: (1) he assumed an infinite size population; (2) he assumed the selective pressure on the new trait to be constant over the years.

Haldane's claim have been revised and re-elaborated by many scientists, and probably the most famous one is evolutionary biologist Motoo Kimura, who, in the early '60s, used a diffusion equation to recalculate the substitution cost. Kimura noticed that under Haldane's model, it would take an enormous number of offsprings to keep the current rate of natural selection. This became the basis of Kimura's neutral selection theory, in which he claimed that the vast majority of genetic changes are not "selected." Instead, according to Kimura, genetic changes are mostly random changes with no effect (neutral), which get fixated in the population simply because of the resampling from one generation to the next (a process called "genetic drift"). In other words, according to Kimura, the fact that some individuals reproduce and others don't causes certain traits to gradually disappear from the population.

So, which is it? Completely neutral mutations that get fixed because of random mating, or complete selection on every single trait? Most likely, it's a combination of both. Very few traits are truly selected for. Mutations arise constantly and, in a small population, they can pick up just because of genetic drift. Historically, people migrated and the geography of the landscape changed, causing populations to split. In a small population a minor mutation is more likely to pick up and then get fixated because of genetic drift, whether the mutation is advantageous or not. However, if the mutation is indeed advantageous, a sudden selective sweep would pick it up. But this -- most likely -- didn't happen under constant pressure over years, like Haldane originally formulated. It was more likely occasional sweeps (think of a particularly virulent flu season, for example) that switched the minor allele from minor to wild-type, and then genetic drift did the rest.

As for Haldane's numbers, they're not as far off as one may think. I did, however, find a paper published in 1977 [2] in which the author showed that Haldane had overestimated the cost of natural selection by allele substitution. Darlington states: "The cost is reduced if recessive alleles are advantageous, if substitutions are large and few, if selection is strong and substitutions are rapid, if substitutions are serial, and if substitutions in small demes are followed by deme-group substitutions. But costs are still so heavy that the adaptations of complex organisms in complex and changing environments are never completed. The rule probably is that most species most of the time are not fully adapted to their environments, but are just a little better than their competitors for the time being."

In other words, evolution is a work in progress.

Dilemma aside, I love Haldane for this famous quote:
"Theories have four stages of acceptance. i) this is worthless nonsense; ii) this is an interesting, but perverse, point of view, iii) this is true, but quite unimportant; iv) I always said so."
And if you've ever tried to publish a scientific paper, or to publish anything at all as a matter of fact, you know exactly what Haldane was talking about.

[1] Haldane, J. (1957). The cost of natural selection Journal of Genetics, 55 (3), 511-524 DOI: 10.1007/BF02984069

[2] Darlington PJ Jr (1977). The cost of evolution and the imprecision of adaptation. Proceedings of the National Academy of Sciences of the United States of America, 74 (4), 1647-51 PMID: 266204

Photo: blue hour is the time of the day when longer exposure time grants you a blue sky and soft, yellow lights. It's particularly beautiful in an urban setting when all the city lights lit up.