One of my first and still most popular posts was on endogenous retroviruses, or ERV: these are viral sequences that got integrated in the host DNA and became part of the noncoding genome. With time, Mother Nature found a way to reuse these viral proteins, for example in the placenta, as I was explaining in the earlier post, and some of those proteins became expressed.
I was at a conference last week, and one of the talks discussed the evolution of these endogenous viral elements (EVEs) and how they have become part of a co-evolutionary process. The speaker compared the phylogenetic trees of many EVEs across different species with the phylogenetic trees of the species themselves, and these trees were topologically similar, meaning that the viruses and their hosts have developed mechanisms of coevolution. What this means is that whenever there has been a divergent event in the evolutionary history of a certain species, that event is also reflected in the evolutionary history of the virus hosted by the species. This is not surprising if you think about it: as the host evolves, the virus has to evolve too in order to survive (the Red Queen effect I talked about here).
How did these endogenous viral sequences end up in our genome? In order to replicate, retroviruses undergo reverse transcription, which turns their RNA into DNA, and then the DNA gets integrated into the host genome. When this happens in a germline cell, the germ cell doesn't undergo replication like other cells, as the integrated viral DNA may eventually be distributed throughout the genome through meiotic replication, and as a result the viral genome is stuck there and gets passed on -- as a non-coding sequence -- to the offsprings.
This explains the presence of endogenous retroviruses in our genome. More intriguing is how RNA viruses got there, given that those viruses replicate without getting integrated into the host genome. In fact, they never get trasncribed into DNA. We still don't know how such viruses could have been integrated into the host genome, though one hypothesis is that reverse transcription (as a rare event) could have been triggered by the reverse transcriptase enzyme naturally present in the cellular retroelements.
"The endogenous viral elements (EVEs) we know today must only be a small subset of those that have existed in the past; many others will have been lost by the chance process of genetic drift, which is the fate of most mutations at low frequency, even those that are selectively advantageous. [. . .] Other EVEs may have been removed by purifying selection because they reduce organismal fitness. In particular, human endogenous retroviruses are usually located in genomic regions away from genes, whereas the integration sites of (presumably recent) exogenous retroviruses are often close to genes, suggesting that there is a selective cost in having EVEs located too close to genic regions ."Most endogenous viral elements are defective and hence are found in non-coding regions of the genome. However, like I discussed in my earlier post, it's not unusual for the sequences to find a new function and become expressed again. When this happens, the sequences could become advantageous to the host and hence get fixed in the population. For example, some endogenous viruses trigger protection in the host against similar exogenous viruses by interacting with the infecting virions and causing them to be defective.
These findings have greatly informed our understanding of viral evolution. Indeed, endogenous viral sequences represent a "fossil record" of past infections.
"The key point here is that once integrated into host genomes, EVEs cease to evolve with the very high substitution rates that characterize exogenous RNA and small DNA viruses and instead replicate using high-fidelity host DNA polymerases and probably experience fewer replications per unit time. This will result in a dramatic reduction in evolutionary rate, from the virus scale (usually around 10e-03 nucleotide substitutions per site, per year) to the host scale (~10e-09 subs/site/year)."Basically, even though viruses evolve at a much faster rate than their hosts, once those sequences are integrated in the germline, from there on, they evolve at the same rate as the host genome, which is much slower. That's how they become "fossils" compared to their exogenous counterparts. For example, studies looking at primate lentiviruses (for example SIV and HIV) have estimated the age of these viruses to be in the thousands at maximum. However, endogenous lentivirus elements in lemurs indicate that they have been circulating for over a million years. Additionally, there is evidence of selection pressure derived from the fitness cost induced by viral infections that also points at the antiquity of some viral families.
 Holmes, E. (2011). The Evolution of Endogenous Viral Elements Cell Host & Microbe, 10 (4), 368-377 DOI: 10.1016/j.chom.2011.09.002