In Part I of this post I discussed the Science paper that proved HIV was the result of a cross-transmission from chimpanzees to humans. In that paper, Hahn et al. conclude with an open question:
"The timing of SIVcpz transmission to humans, leading ultimately to the HIV-1 pandemic, has been a challenging question. We know from analyses of stored samples that humans in west central Africa had been infected with HIV-1 group M viruses by 1959 and with group O viruses by 1963. But how much earlier were these viruses introduced into the human population? [...] It should be possible to estimate the timing of the onset of the pandemic by calculating the date of the last common ancestor of HIV-1 group M."
In a phylogenetic tree (see the definition I gave last time), the last common ancestor is the root of the tree: that's the "patriarch" of the sample if you will, the one sequence from which, one divergent event at the time, the whole sample originated. Phylogenetic analyses allow us not only to reconstruct the evolutionary history of the sequences, but also, if you have a rough idea of what the mutation rate is (i.e. how often new mutations arise) to time them. It's a technique often referred to as "molecular clock," which originated from the observation that the number of molecular differences between different lineages increases linearly with time and that substitutions accumulated according to a Poisson distribution.
Korber et al. used parallel computers to apply maximum-likelihood tree-building methods to the envelope sequences (the envelope is one of the HIV genes) from 159 individuals. They note:
"Although it is unrealistic to expect that HIV-1 evolution will always rigidly adhere to a molecular clock, it is, however, the average behavior of many sequences that we consider here, and our control estimates of known times were accurate."To this they combined another data point: the year of sampling of the sequences used to reconstruct the tree.
|(A) The phylogenetic tree used for the calculation. (B) The branch lengths from the tree plotted versus the year of sampling an dprojected backwards in time.|
Once they reconstructed the phylogenetic tree, with the root sitting more or less in the middle, and thus at the same distance from the various HIV subgroups (the clusters marked with capital letters in panel A above), they plotted the branch lengths of the tree against time (panel B) and did a linear fit to extrapolate the time since the last common ancestor: 1931, with a 95% confidence interval of 1915 to 1941. Furthermore, testing a known HIV-1 group M isolate from 1959 gave an accurate estimate for the date of its origin, indicating that the assumptions of the method are reasonable.
Notice that 1931 marks the year the first HIV-1 lineage, the M-group, started to spread and diversify in humans. It does not tell us whether or not the virus was transmitted at the same time as it started to diversify. It could be possible that the virus cross-transmitted to humans earlier and remained isolated within a small population. Around the '30s socioeconomic changes would've allowed the spread of the virus:
"Strictly speaking, our estimate is neither an upper nor a lower bound on the date of the actual zoonosis. Rather, it is the approximate time of the bottleneck event that was the genesis of the M group and captures the moment of the beginning of the expansion of the M group. If the M group originated in humans, then this would date the founder virus of the pandemic."Another important question is addressed in the following commentary by David Hillis:
"If HIV has been present in human populations since at least the 1930s (and probably much earlier), why did AIDS not become prevalent until the 1970s? The phylogenetic trees of HIV-1 indicate that the spread of the virus was initially quite slow‚ by 1950 there existed 10 or fewer HIV-1 M-group lineages that left descendants that have survived to the present. The epidemic exploded in the 1950s and 1960s, coincident with the end of colonial rule in Africa, several civil wars, the introduction of widespread vaccination programs (with the deliberate or inadvertent reuse of needles), the growth of large African cities, the sexual revolution, and increased travel by humans to and from Africa. Given the roughly 10-year period from infection to progression to AIDS, it was not until the 1970s that the symptoms of AIDS became prevalent in infected individuals in the United States and Europe."B. Korber, M. Muldoon, J. Theiler, F. Gao, R. Gupta, A. Lapedes, B. H. Hahn, S. Wolinsky, and T. Bhattacharya. (2000). Timing the Ancestor of the HIV-1 Pandemic Strains Science, 288 (5472), 1789-1796 DOI: 10.1126/science.288.5472.1789