HIV has 10 genes spread throughout roughly 10 thousand nucleotides. The genes Rev and Tat (and Tev, when it’s present), completely overlap with the larger gene Env. When a gene lies within another, we say that the two genes are “nested.”
How does the virus know which protein to code if the information is overlapping? The key is the “reading frame.” Remember, a gene is a string of nucleotides (A, G, C, and T), and a protein is a string of amino acids (also denoted with letters), so it really boils down to translating the string of nucleotides into one made of amino acids. It takes three nucleotides (each triplet is called a "codon") to code one amino acid. So, suppose you have a string of DNA that looks like this (the example is taken from this wonderful site):
ATGCCCAAGCTGAATAGCGTAGAGGGGTTTTCATCATTTGAGGACGATGTATAA
The three nucleotides in green on the left make the five-prime end, where the translation starts, and it can start at any of the three "green" nucleotides. Now, if you begin reading from the A, you get one reading frame, if you begin from the T, you get a second frame, and, lastly, if you begin from the G you get a third one. Like this:
ATG|CCC|AAG|CTG|… becomes MPKL…
TGC|CCA|AGC|TGA|… becomes CPS…
GCC|CAA|GCT|GAA|… becomes AQAE…
As you can see, a single strand of DNA can have three possible reading frames because, depending on where you start partitioning the DNA, the triplets change, giving rise to different sequences of amino acids. At this point, you’re probably wondering why go through all this trouble.
Overlapping and nested genes are not uncommon in organisms like virus and bacteria, which have very short genomes (compared to us). For these organisms, a compact genome means a speedier replication process, which is evolutionary advantageous [1].
But how do you explain overlapping genes in more complex organisms like mammals [2]? Our genome is huge compared to that of a virus, and, like I’ve said many times before, it’s mostly non-coding. If there’s plenty of room for extra genes, why do we have overlapping ones?
It gets even more complicated. HIV carries RNA, which is single-stranded, hence, the three reading frames. But we have two strands of DNA, hence six possible reading frames, and some overlapping gene pairs in our genome are indeed transcribed on opposite strands of DNA. These pairs are called sense-antisense gene pairs, and we really don’t know their function. One reason they exist could be that they simply are a remnant of evolution [1]. However, recent studies have shown that these gene pairs may be associated with cancer [3] and diseases such as Alzheimer [4]. In fact, a mutation in the overlapping regions “doubles” its effect in a way, since it affects both genes.
Such associations should not be completely surprising and in fact, I believe they are the tip of some deeper regulatory mechanism that we have yet to understand. If we go back to our very first ancestors, bacteria, we see that these primitive organisms have evolved complex regulatory mechanisms based on sense-antisense genes. These mechanisms have been studied in particular in the context of drug resistance, where it has been shown that this type of “antagonist” transcription has a role in controlling how bacteria exchange genetic material [5], and, as a result facilitate the rise of drug-resistant subspecies. I should explain this phenomenon more in detail in a later post.
[1] Kumar A (2009). An overview of nested genes in eukaryotic genomes. Eukaryotic cell, 8 (9), 1321-9 PMID: 19542305
[2] Sanna CR, Li WH, & Zhang L (2008). Overlapping genes in the human and mouse genomes. BMC genomics, 9 PMID: 18410680
[3] Yu W, Gius D, Onyango P, Muldoon-Jacobs K, Karp J, Feinberg AP, & Cui H (2008). Epigenetic silencing of tumour suppressor gene p15 by its antisense RNA. Nature, 451 (7175), 202-6 PMID: 18185590
[4] Guo JH, Cheng HP, Yu L, & Zhao S (2006). Natural antisense transcripts of Alzheimer's disease associated genes. DNA sequence : the journal of DNA sequencing and mapping, 17 (2), 170-3 PMID: 17076261
[5] Chatterjee A, Johnson CM, Shu CC, Kaznessis YN, Ramkrishna D, Dunny GM, & Hu WS (2011). Convergent transcription confers a bistable switch in Enterococcus faecalis conjugation. Proceedings of the National Academy of Sciences of the United States of America, 108 (23), 9721-6 PMID: 21606359
Photo: Green Anemone, New England Aquarium, Boston.
That's pretty interesting. In a certain sense could the overlapping frames be said to be a form of compression?
ReplyDeleteYes, they were certainly "born" as a form of compression. Tiny organisms like viruses and bacteria survive on fast replication, hence they need a "small" DNA. As they got more complex, though, they needed more genes, hence they "compressed" them into the still small DNA.
ReplyDeleteThe thing I find fascinating is, why do we have them? I really think there's a lot to explore on what kind of function these overlapping genes have. In a way, they are "risky" because one mutation there carries its effect on multiple genes.
I'll talk more about how they work in bacteria next week. Thanks for your comment!