Debunking myths on genetics and DNA

Monday, February 15, 2016

Decoding the Dark Matter of the Human Genome

First appeared on my Huffington Post blog on February 15, 2016. 

In 1994, researchers from Harvard and Stanford published a paper in which they described three mice: one was yellow and fat, one mottled and fat, and the last one was brown and lean. An ordinary image, except for one thing: despite being so different, all three mice were genetically identical.

If their genes were exactly the same, what was causing such striking differences in the mice?

Three genetically identical mice that do not look the same. Why?
Photo credit: Nature Publishing, used with permission

At the time, Karissa Sanbonmatsu--staff scientist at Los Alamos National Laboratory--was working on plasma physics, and she had no idea that one day she would tap into this mystery. Even though she started from a completely different field, from the very beginning she was obsessed by one question: What distinguishes life from matter?

"In order to answer that question, the first place to look is the ribosome," Karissa explains. "It's the oldest molecule found in life."

And for a reason: all living cells are made of proteins, and ribosomes are the "factory" inside the cell where these proteins are made.

The breakthrough came in 2003, when the Q Machine, at the time the second fastest supercomputer in the world, was built at Los Alamos National Laboratory. Using the Q Machine, Karissa and colleagues were able to run the largest simulation ever performed until then in biology, allowing them to be the first team to publish an atomic structure of a ribosome in 2004.

This milestone set the foundation for a deeper understanding of the ribosome. Possible future applications, for example, include making new cancer therapies based on how ribosomes differentiate in healthy versus cancerous tissue.

In the meantime, a new, emerging field had been revolutionizing the way we think of genetics and inheritance: epigenetics. The three lab mice from 1994 were one example of how, by switching genes on and off, genetically identical individuals could have different observable characteristics ("phenotypes"). Epigenetics is the field that studies the mechanisms by which the environment can trigger these on/off gene patterns (called gene expression patterns), and how these modifications can be passed on to the next generation.

Both animal and human studies have shown that traits acquired by the parents, such as stress responses or the ability to store fat, can be passed on to their offspring. While DNA remains unaltered, what triggers these changes in phenotype is the activation or deactivation of genes--in other words, whether certain genes produce the proteins they code for.

But how are genes turned on or off? Specific factors regulate whether a gene is expressed (turned on) or silenced (turned off). These factors are recruited by RNA, the single-stranded molecule implicated in numerous cellular processes, from coding and decoding genes to protein synthesis.

When they were first discovered, RNA and DNA molecules that didn't code for proteins were dubbed the "dark matter" of the genome because their function was unknown. Today we know that these molecules can affect gene expression and even change traits by turning on or off certain genes.

That RNA had the power to turn genes off has been known since the early 2000s, when small RNAs were used to create mice whose cells had one particular gene silenced. Larger RNA molecules that don't code for any specific protein can also be found in different sizes inside the cell. Called long non-coding RNAs (lncRNA), they are present in great numbers in stem cells and embryos and are essential in many developmental processes.

"RNA could be the missing link in epigenetics," Karissa explains. "Ribosomes are made of RNA, and so, for me, the leap from ribosomes to lncRNAs was a natural one."

In order to understand how lncRNAs can turn genes on and off, scientists first need to unveil their molecular structure. Can lncRNAs assume different shapes, or 3D structures, and change function accordingly, or are they bidimensional molecules? Karissa and colleagues are determined to solve the puzzle. The same techniques used to resolve the ribosome structure in 2005 can be applied to lncRNAs, but because of their larger size, the team will need faster and better computational tools than the ones they used 10 years ago.

Luckily, next-generation supercomputing is underway at Los Alamos with the construction of Trinity, a machine fast enough to accommodate simulations of 3D atomic structures. This is where Karissa and colleagues are planning to run their lncRNA models.

Revealing the shape of lncRNAs would be a breakthrough. But for Karissa and her team, another even more ambitious project is on the way: "Thanks to the amazing resources offered by Trinity, we will be able to run the first atomistic simulation of human chromatin, the big 'yarn' of DNA and proteins that sits inside the cell nucleus."

Source: National Institutes of Health

This means simulating the 3D structure of three billion base pairs, plus all the proteins the DNA is wrapped around! All genes reside inside the chromatin, and this is where they are activated or deactivated. Therefore, solving the 3D structure of the chromatin will shed new light on the epigenetic mechanisms that regulate gene expression.

Many diseases are characterized by altered gene expression. For example, DNA-repairing genes are turned off in cancer cells, while genes that promote replication are over-expressed. Understanding the mechanisms that lead to these altered on/off patterns and how to reverse them can pave the way to new therapies and more efficient treatments--a bright future indeed for molecules once dismissed as the genome's dark matter.

Elena E. Giorgi is a computational biologist in the Theoretical Division (Theoretical Biology group) at the Los Alamos National Laboratory and the author of the science fiction thrillers Chimeras, Mosaics, and Gene Cards.

Karissa Sanbonmatsu's TEDx talk "How You Know You're in Love: Epigenetics, Stress & Gender Identity."

Duhl DM, Vrieling H, Miller KA, Wolff GL, & Barsh GS (1994). Neomorphic agouti mutations in obese yellow mice. Nature genetics, 8 (1), 59-65 PMID: 7987393

Tung CS, & Sanbonmatsu KY (2004). Atomic model of the Thermus thermophilus 70S ribosome developed in silico. Biophysical journal, 87 (4), 2714-22 PMID: 15454463

Sanbonmatsu KY, Joseph S, & Tung CS (2005). Simulating movement of tRNA into the ribosome during decoding. Proceedings of the National Academy of Sciences of the United States of America, 102 (44), 15854-9 PMID: 16249344

Structural architecture of the human long non-coding RNA, steroid receptor RNA activator. Novikova IV1, Hennelly SP, Sanbonmatsu KY. Nucleic Acids Res. 2012 Jun;40(11):5034-51. doi: 10.1093/nar/gks071. Epub 2012 Feb 22. PMID: 22362738
Sanbonmatsu KY (2016). Towards structural classification of long non-coding RNAs. Biochimica et biophysica acta, 1859 (1), 41-5 PMID: 26537437


  1. I'm always happy to find a molecular genetics post here--always clear and interesting. Great to get the news on exciting work!

    1. Thanks, Hollis! There will be more, I promise, and now you can follow me on the Huff Post too:

    2. Congrats on your new venue!

  2. It's amazing what is being discovered and understood today. :-)

    Anna from elements of emaginette


Comments are moderated. Comments with spam links will be deleted and never published. So, if your intention is to leave a comment just to post a bogus link, please spare your time and mine. To all others: thank you for leaving a comment, I will respond as soon as possible.