I often blog about viruses because, well, I work on viruses. Here's a quick summary of things I've blogged about that I find absolutely mind-blowing:
1. About 10% of the human genome is made of genes we inherited from viruses that had replicated in our ancestors millions of years ago.
2. Viruses evolve as their hosts evolve (The Red Queen Effect), and in fact we can retrace their evolution in parallel with that of their hosts. The same is true within a single host, enabling us to retrace the evolution of a single virus in parallel with that of the host's antibodies.
3. Genes expressed by viruses and bacteria in our body can affect our phenotype.
4. We can use the ability of viruses to target certain cells to devise new cancer therapies.
5. We can use viruses to edit the genome of certain cells and cure genetic defects through gene therapy.
So yes, viruses are cool and they play a huge role in evolution. The fact that roughly 10% of our genome is made of viral elements (called human endogenous retroviruses, or HERVs) makes our DNA a "living fossil": these are viruses that infected our ancestors millions of years ago. Retroviruses in particular insert their genome inside the cell's DNA in order to replicate. In some instances, these viral genomes got stuck inside germ line cells and that's how they got passed on to the host's offspring and became part of our DNA.
Today these viruses are extinct, as they evolved into new forms, but by investigating the inactivated genes they left in our genome, researchers can find out what they looked like millions of years ago. It's like digging out fossils in our own cells.
It's exactly what two scientists from The Rockefeller University did with one family of HERVs in particular, HERV-K(HML-2) believed to have replicated in human ancestors less than one million years ago (making it one of the most recent forms found in the human genome). They looked at several of these genes across different subjects and reconstructed a "consensus genome", in other words, a genetic sequence that at each DNA position had the nucleotide most frequently found across all study subjects.
For example, if the samples across all subjects looked something like this, with the differences, highlighted in red (made up sequences!!):
then the consensus sequence would be one of the sequences without red mutations because they represent the majority, in other words:
GATACTTGGACAGGAGTTGAAGCTATAATAAGAATTCTACAACAACTGCTBack to the HERV study, which was published in PLoS Pathogens in 2007, Lee and Bieniasz recreated the HERV-K consensus from ten full-length HERV-K(HML-2) sequences and then reconstituted the virus in the laboratory. The ten sequences were selected based on their similarity to HERV-K113, a relatively young and intact HERV-K provirus. While all ten sequences had defects that made viral genes inactivated, selecting the most frequent base at each position, eliminated these defects and yielded a full genome sequence (the consensus) with intact proteins. This derived consensus sequence may not be 100% identical to the actual virus that was integrated into the human genome close to a million of years ago, but it's pretty close. This "closeness" was confirmed in the lab when the scientists saw that the virus they reconstructed based on the consensus genome was indeed able to infect T cells in vitro. All proteins of the reconstructed virus were functional and able to carry one the virus's replication cycle.
It's like Jurassic Park... for viruses. :-)
Lee, Y., & Bieniasz, P. (2007). Reconstitution of an Infectious Human Endogenous Retrovirus PLoS Pathogens, 3 (1) DOI: 10.1371/journal.ppat.0030010