CHIMERAS: Human Genome Project

Showing posts with label Human Genome Project. Show all posts

Wednesday, May 21, 2014

The Human Knock-out: looking for non-working genes

The word "knock-out" in biology is used for lab animals like mice, for example, when one of their genes is silenced in order to study the effects of not having that gene. Silencing a gene (or knocking it out, hence the nomenclature "knock-out mouse") means that gene is no longer producing the protein it codes for. This is a condition sought for in situations where you have to test for a drug and hence the first step is to reproduce the genetic condition that caused the disease.

Mice are often "humanized", i.e. genetically engineered to carry human genes so that the experiment can be a better model for drug or therapy testing. Unfortunately, even when humanized, mice or lab animals in general are poor models for humans. When things don't work out in an animal model, we know that the experiment should not be carried out on to humans, but when on the other hand things go well in an animal experiment, there is no guarantee that it will work on humans too.

"Human chip" technology is a very promising solution, as it would bypass the need of animal testing for drug discovery. The idea is to have cell cultures from different organs on a "chip" the size of a smart phone. Lung, liver, kidney chips have already been designed and tested, but lately there has been an even further advance in making the chips part of a network connected by "blood vessels": Athena (Advanced Tissue-engineered Human Ectypal Network Analyzer) is an ongoing project to see how four organ chips (liver, lung, heart and kidney), connected by tubed filled with artificial blood, can effectively simulate a human body for drug testing and toxin screening. Athena, also dubbed the "desktop human" as given its size it would conveniently sit on a desktop, is a $19 million dollars project that will be built in the next five years.
You can read the full story here.

Athena, however, only has four organs and is still poor surrogate of the human body. The ideal solution would be to have human knock-outs to study the true effect of drugs, which of course is a little unethical to pursue. Unless human knock-outs already exist in nature. Well, guess what? They do, and they are far more common than we originally thought: on average every person has about 20 inactivated genes [1]. Wait, it gets better. Because, you may wonder, if they are so common, how come we never noticed? The ~20 inactivated genes must have some effects and/or symptoms, right?

Not necessarily. Yes, that's the most amazing thing: how robust our DNA is. People can have inactivated genes and still be healthy. It doesn't always happen, yet there are some cases when deficient gene copies are somehow compensated by other genes. And that's exactly why studying these human knock-outs is so relevant: we need to understand how people can stay healthy even when lacking important genes, as this can give new insight in drug discovery and therapy development.

In [1], MacArthur et al. screened close to 3,000 variants predicted to cause loss of gene function from 185 human genomes. Then challenge is to distinguish the "true" loss of function variants from sequencing errors. The researchers designed a "filter" to distinguish the "true" variants from the artificial errors. To me, the most striking discovery they made is that loss of function doesn't work as an "on/off" switch, rather, it can lead to a range of possible scenarios:

"Homozygous inactivation of a gene can have a range of phenotypic effects: At one end of the spectrum are severe recessive disease genes, while at the other end are genes that can be inactivated with- out overt clinical impact, referred to here as LoF- tolerant genes. Clinical sequencing projects seeking to identify disease-causing mutations would benefit from improved methods to distinguish where along this spectrum each affected gene lies [1]."

Jocelyin Kaiser wrote a nice article on Science [2] on the recent developments of this type of research: the plan is to sequence the genome of many more "healthy" people, find what genes they have inactivated, and then study their clinical characteristics. Some of these loss of function variations may end up being beneficial, as is the case for PCSK9, for example: the gene encodes for the homonymous enzyme, which has been associated with high cholesterol. As it turns out, individuals who carry loss of function mutations in this gene have low cholesterol and a significantly reduced risk of stroke and heart disease [3].

[1] MacArthur, D., Balasubramanian, S., Frankish, A., Huang, N., Morris, J., Walter, K., Jostins, L., Habegger, L., Pickrell, J., Montgomery, S., Albers, C., Zhang, Z., Conrad, D., Lunter, G., Zheng, H., Ayub, Q., DePristo, M., Banks, E., Hu, M., Handsaker, R., Rosenfeld, J., Fromer, M., Jin, M., Mu, X., Khurana, E., Ye, K., Kay, M., Saunders, G., Suner, M., Hunt, T., Barnes, I., Amid, C., Carvalho-Silva, D., Bignell, A., Snow, C., Yngvadottir, B., Bumpstead, S., Cooper, D., Xue, Y., Romero, I., , ., Wang, J., Li, Y., Gibbs, R., McCarroll, S., Dermitzakis, E., Pritchard, J., Barrett, J., Harrow, J., Hurles, M., Gerstein, M., & Tyler-Smith, C. (2012). A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes Science, 335 (6070), 823-828 DOI: 10.1126/science.1215040

[2] Kaiser, J. (2014). The Hunt for Missing Genes Science, 344 (6185), 687-689 DOI: 10.1126/science.344.6185.687

[3] Cohen, J., Pertsemlidis, A., Kotowski, I., Graham, R., Garcia, C., & Hobbs, H. (2005). Low LDL cholesterol in individuals of African descent resulting from frequent nonsense mutations in PCSK9 Nature Genetics, 37 (2), 161-165 DOI: 10.1038/ng1509

Monday, September 24, 2012

ENCODE sheds light on non-coding variants

Back when I started studying human genetics, we were still doing single-gene associations. Namely, we would type a bunch of variants in a single gene and then do a case-control association study to see which, if any, of those variants marked an increase in disease risk. That's how breast cancer markers such as BRCA1 and BRCA2 have been found.

When the Human Genome Project was completed in 2003, scientists started looking for disease risk alleles across the whole genome. The findings were puzzling: more than 90% of the diseases-associated variants fell in non-coding regions. Why? One issue I've previously discussed is that when looking at tens of thousands of loci, you need huge sample sizes and often, when huge sample sizes aren't feasible, these studies are underpowered. Another possible explanation lies in epistasis, and the detected signal may be the effect of some unknown correlation.

However. You knew there was going to be a "however", right? Because thanks to the ENCODE project we now know that if a genetic variant falls in a non-coding region, it doesn't mean it has no effect whatsoever. ENCODE is bound to shed new light on these numerous non-coding risk alleles that genome-wide association studies (GWAS) studies have found.

Last time I discussed DHSs, or DNase I hypersensitive sites. These are chromatin regions where many regulatory elements have been found. In [1], Maurano et al. show that many of the non-coding variants associated with common diseases are concentrated in regulatory DNA marked by DHSs. The researchers performed genome-wide DNase I mapping across 349 cell and tissue types. As discussed last week, regions of DNase I accessibility harbor regulatory elements. The researchers also examined the distribution of 5654 non-coding SNPs (single base variants) that had been significantly associated to some disease or trait in genome-wide studies.

These the main findings:

"Fully 76.6% of all noncoding GWAS SNPs either lie within a DHS (57.1%, 2931 SNPs) or are in complete linkage disequilibrium (LD) with SNPs in a near-by DHS (19.5%, 999 SNPs)."

To be in linkage disequilibrium means that the variant is typically inherited together with a DHS site. Suppose the true causal variant is at locus A, but you haven't typed locus A, you've typed locus B, and A and B are inherited together. Then B is going to light up as strong signal in your statistical analysis. So, what Maurano et al. are saying in the above paragraph is that the non-coding SNPs either turned up in a DHS site, or they found evidence that they were strongly correlated with one of such sites.

"Many common disorders have been linked with early gestational exposures or environmental insults. Because of the known role of the chromatin accessibility landscape in mediating responses to cellular exposures such as hormones, we examined if DHSs harboring GWAS variants were active during fetal developmental stages. Of 2931 noncoding disease- and trait-associated SNPs within DHSs globally, 88.1% (2583) lie within DHSs active in fetal cells and tissues. Of DHSs containing disease-associated variation, 57.8% are first detected in fetal cells and tissues and persist in adult cells (“fetal origin” DHSs), whereas 30.3% are fetal stage–specific DHSs.

And finally:

"Enhancers may lie at great distances from the gene(s) they control and function through long-range regulatory interactions, complicating the identification of target genes of regulatory GWAS variants."

GWAS variants control distant genes that need not even be on the same chromosome. Furthermore, these variants in DHSs sites tend to alter allelic chromatin state, thus modulating the accessibility of genes to transcription factors. Disease-linked variants were found to alter such accessibility, resulting in allelic imbalance (one allele gets transcribed more than the other one), possibly explaining their role in altering the disease risk or quantitative trait.

[1] Matthew T. Maurano, Richard Humbert, Eric Rynes, Robert E. Thurman, Eric Haugen, Hao Wang, Alex P. Reynolds, Richard Sandstrom, Hongzhu Qu, Jennifer Brody, Anthony Shafer, Fidencio Neri, Kristen Lee, Tanya Kutyavin, & Sandra Stehling-Sun (2012). Systematic Localization of Common Disease-Associated Variation in Regulatory DNA Science DOI: 10.1126/science.1222794

Monday, July 30, 2012

Oedipus's dilemma

I love Greek mythology, and of all myths, Oedipus is probably the one that fascinates me the most. Nothing to do with the fact that it's become a psychiatric hallmark. I love this myth because it always makes me wonder: if somebody came to you and told you they knew with absolute certainty your future (how many years you'll live, what you'll accomplish, etc.), would you want to know? It's a paradox, because that knowledge would affect the future course of action you choose. Think about Laius: he fulfilled his destiny exactly because of the actions he took in order to avoid his destiny. Predestination paradoxes have been used forever in all mythologies, and even these days -- can you think of at least a novel or a movie where it's been used?

I'm rambling, but I actually have a point for this post, I promise.

As you know, nobody's going to come and offer to tell you your exact destiny. But, they might offer to type your entire genome. And from that, they may argue they can tell you the exact risk you have of developing certain diseases. In fact, some of you may already have opted to have their entire genome typed. Such services have become more affordable, accurate, and efficient in just a handful of years. The benefits are numerous: drug therapy could be genetically targeted, and just by looking at your DNA your doctor could already know which drugs will be more effective and which could instead have adverse effects. Assessing one's risk for cancer, diabetes, or other diseases can be a good motivator to a healthier lifestyle and open up preventive treatment choices.

So, where's the catch?

The catch is that, as a new study on Science Translational Medicine shows [1], sequencing the entire genome doesn't tell us the whole story. In fact, in many cases, it doesn't tell us much at all.

Roberts et al. argue that the risk we need to be able to assess should be pretty strong in order to make preventive measures effective. For example, currently the general population risk of developing breast cancer within a woman's lifetime is 12%, obviously too low for women to opt for a preventive mastectomy. However, if a woman learned that her risk was 90%, she might reconsider. Any preventive measure carries consequences, and therefore, the risk reduction it ensures should be pretty strong in order to establish clinical utility.

After setting a meaningful risk threshold, Roberts et al. collected genetic data from numerous homozygous twin registries and cohorts. (Little pet peeve of mine: couldn't find the exact number of pairs they had in the study, it's probably in the supplemental material, but I find sample size important enough to expect it in the main text). They then developed a mathematical model to estimate the maximum capacity of whole-genome sequencing to predict the risk for 24 common diseases, including autoimmune diseases, cancer, cardiovascular diseases, genito-urinary diseases, neurological diseases, and obesity-associated diseases. The idea behind the mathematical model is to assess the risk increment of an individual with a disease-associated genotype compared to someone with no genetic risk at all. Since homozygous twins have nearly identical genomes, you would expect their genetic risks to have a nearly identical outcome.

"The general public does not appear to be aware that, despite their very similar height and appearance, monozygotic twins in general do not always develop or die from the same maladies. This basic observation, that monozygotic twins of a pair are not always afflicted by the same maladies, combined with extensive epidemiologic studies of twins and statistical modeling, allows us to estimate upper and lower bounds of the predictive value of whole-genome sequencing."

Using their model, the researchers showed that most individuals would show a risk predisposition to at least one of the 24 diseases tested. At the same time, they would test negative for most diseases. What does this mean? It means that we cannot predict the risk allele distribution of the actual population, and most often genetic testing will only say that individual X has the same risk of developing disease Y as the general population -- hardly enough to make whole genome testing surpass the clinical utility threshold.

"Thus, our results suggest that genetic testing, at its best, will not be the dominant determinant of patient care and will not be a substitute for preventative medicine strategies incorporating routine checkups and risk management based on the history, physical status, and life-style of the patient."

[1] Nicholas J. Roberts, Joshua T. Vogelstein, Giovanni Parmigiani, Kenneth W. Kinzler, Bert Vogelstein1 and, & Victor E. Velculescu (2012). The Predictive Capacity of Personal Genome Sequencing Sci Transl Med 4, 133ra58 DOI: 10.1126/scitranslmed.3003380

Sunday, July 17, 2011

The case of "junk DNA" and why it shouldn't be called junk

Human DNA is made of roughly three billion pairs of nucleotides. In other words, each of our chromosomes contains a long string of A's, G's, T's and C's, and all together those strings form a word that's long three billion letters.

When the Human Genome Project started, scientists expected to find millions of coding genes. Coding genes are strings of DNA that contain the "instructions" on how to make proteins. When the project was completed, in 2003, they had found roughly twenty thousand coding genes. The surprise? Most of our DNA is not made of genes.

What is it made of, then?

It's been called many names: pseudogenes; junk DNA; non-coding DNA. Of all terms, "junk DNA" is the most unfortunate. Just because it doesn't have a function that we know of, it doesn't mean it's not important. And it doesn't mean it can't affect our lives.

In the next few weeks I'll make a case that "junk DNA" is indeed important.
It's part of our history, our heritage, and our future.
I will make my case using three concepts:

See you soon!

Picture: Lion's Mane Jellyfish, New England Aquarium, Boston. Canon 40D, exposure time 1/30, focal length 30mm.