Debunking myths on genetics and DNA

Showing posts with label Statistical Tests. Show all posts
Showing posts with label Statistical Tests. Show all posts

Thursday, May 23, 2013

Should you worry about vitamin D deficiency? Maybe. Or maybe not.


Since my last blog post, where I shared my thoughts on BRCA1, BRCA2, and preventive mastectomies, I've been asked what else can a woman do to reduce her risk of breast cancer. I've heard a big deal about vitamin D, so I did a bit of research on the matter.

As a disclaimer, I should tell you up front that, though many correlations between vitamin D deficiency and cancer risk have been found, just as many have been refuted or found inconclusive. You can read more about it on the wikipedia page.

What is vitamin D?

The name "vitamin D" includes a group of steroid-like molecules (they are similar to steroids, but not quite steroids) that help our intestine absorb calcium and phosphates. Since calcium is essential in bone development, vitamin D deficiency has been most commonly associated to osteoporosis and other bone-related diseases. There aren't many foods rich in vitamin D, however, vitamin D can be endogenously synthesized when the skin is exposed to sunlight. Unfortunately, modern lifestyle keeps us cooped up many hours in office cubicles, or in the house during chores, or in malls. When we're out enjoying the sunshine we cover up with hats and super-protective sunscreens because we've been told that the sun is bad for the skin and can cause malignancies. As a consequence, vitamin D deficiency is increasing world-wide.

There is a foundation for all the studies that have analyzed correlations between several diseases, including cancers, and vitamin D: (i) several ecological studies have found a trend for an increase in incidence of certain cancers at higher latitudes, suggesting that longer exposures to the sun may have a protective effect. (ii) The vitamin D receptor (VDR) is expressed in many cells of the immune system, and mouse models have shown that vitamin D deficiency can promote certain auto-immune diseases. In a recent review, Sundaram and Coleman examine the link between vitamin D and influenza [Adv. Nutr. 2012 3: 517-525]. (iii) "VDR regulates a wide range of cellular mechanisms central to cancer development, such as apoptosis (cell death), cell proliferation (uncontrolled cell growth), differentiation, angiogenesis, and metastasis [1]". In line with this observation, Pereira, Larriba, and Munoz published a review on the evidence that vitamin D plays a protective role in colon cancer [Endocr. Relat. Cancer 2012 19: R51-R71].

In [1], Crew discusses the use of vitamin D supplementation as part of breast cancer prevention. She presents many interesting findings, for example:
"Colon, breast, and lung cancer have all demonstrated downregulation of expression of VDR when compared to normal cells and well-differentiated tumors have shown comparably more VDR expression as measured by immunohistochemistry when compared to their poorly differentiated counterparts. Higher tumor VDR expression has also been correlated with better prognosis in cancer patients [1]."
Crew looks at different types of studies: some suggest beneficial effects from using vitamin D (calcitriol) in combination with other anti-cancer treatments; some found an inverse association with mammography density, a biomarker for breast cancer (supposedly high density increases the risk of cancer); some found an inverse association between better breast cancer prognosis and vitamin D deficiency. However, many of these studies have limitations. For example, some only assess the levels of vitamin D through dietary intake, which is not a good measure of the circulating levels because it doesn't account for vitamin D synthesized through sun exposure. Some were confounded by obesity since fat is known to sequestrate vitamin D and also raise breast cancer risk. In light of all these considerations, Crew concludes:
"Even with substantial literature on vitamin D and breast cancer, future studies need to focus on gaining a better understanding of the biologic effects of vitamin D in breast tissue. Despite compelling data from experimental and observational studies, there is still insufficient data from clinical trials to make recommendations for vitamin D supplementation for breast cancer prevention or treatment [1]."

As I often do in my posts, rather than giving you answers, I make an effort to provide you with pointers and food for thought: in the end you have to make your own decisions about your health and the wellbeing of your family. As a personal note, I'll add that on my last blood report my vitamin D circulating levels were undetectable. I had no symptoms whatsoever, but I am now taking a vitamin D supplement. I'm also much less paranoid about smothering my kiddos with sunscreen when they play outside (which has made them much happier, two birds with one stone).

[1] Crew, K. (2013). Vitamin D: Are We Ready to Supplement for Breast Cancer Prevention and Treatment? ISRN Oncology, 2013, 1-22 DOI: 10.1155/2013/483687

ResearchBlogging.org

Monday, July 30, 2012

Oedipus's dilemma


I love Greek mythology, and of all myths, Oedipus is probably the one that fascinates me the most. Nothing to do with the fact that it's become a psychiatric hallmark. I love this myth because it always makes me wonder: if somebody came to you and told you they knew with absolute certainty your future (how many years you'll live, what you'll accomplish, etc.), would you want to know? It's a paradox, because that knowledge would affect the future course of action you choose. Think about Laius: he fulfilled his destiny exactly because of the actions he took in order to avoid his destiny. Predestination paradoxes have been used forever in all mythologies, and even these days -- can you think of at least a novel or a movie where it's been used?

I'm rambling, but I actually have a point for this post, I promise.

As you know, nobody's going to come and offer to tell you your exact destiny. But, they might offer to type your entire genome. And from that, they may argue they can tell you the exact risk you have of developing certain diseases. In fact, some of you may already have opted to have their entire genome typed. Such services have become more affordable, accurate, and efficient in just a handful of years. The benefits are numerous: drug therapy could be genetically targeted, and just by looking at your DNA your doctor could already know which drugs will be more effective and which could instead have adverse effects. Assessing one's risk for cancer, diabetes, or other diseases can be a good motivator to a healthier lifestyle and open up preventive treatment choices.

So, where's the catch?

The catch is that, as a new study on Science Translational Medicine shows [1], sequencing the entire genome doesn't tell us the whole story. In fact, in many cases, it doesn't tell us much at all.

Roberts et al. argue that the risk we need to be able to assess should be pretty strong in order to make preventive measures effective. For example, currently the general population risk of developing breast cancer within a woman's lifetime is 12%, obviously too low for women to opt for a preventive mastectomy. However, if a woman learned that her risk was 90%, she might reconsider. Any preventive measure carries consequences, and therefore, the risk reduction it ensures should be pretty strong in order to establish clinical utility.

After setting a meaningful risk threshold, Roberts et al. collected genetic data from numerous homozygous twin registries and cohorts. (Little pet peeve of mine: couldn't find the exact number of pairs they had in the study, it's probably in the supplemental material, but I find sample size important enough to expect it in the main text). They then developed a mathematical model to estimate the maximum capacity of whole-genome sequencing to predict the risk for 24 common diseases, including autoimmune diseases, cancer, cardiovascular diseases, genito-urinary diseases, neurological diseases, and obesity-associated diseases. The idea behind the mathematical model is to assess the risk increment of an individual with a disease-associated genotype compared to someone with no genetic risk at all. Since homozygous twins have nearly identical genomes, you would expect their genetic risks to have a nearly identical outcome.
"The general public does not appear to be aware that, despite their very similar height and appearance, monozygotic twins in general do not always develop or die from the same maladies. This basic observation, that monozygotic twins of a pair are not always afflicted by the same maladies, combined with extensive epidemiologic studies of twins and statistical modeling, allows us to estimate upper and lower bounds of the predictive value of whole-genome sequencing."
Using their model, the researchers showed that most individuals would show a risk predisposition to at least one of the 24 diseases tested. At the same time, they would test negative for most diseases. What does this mean? It means that we cannot predict the risk allele distribution of the actual population, and most often genetic testing will only say that individual X has the same risk of developing disease Y as the general population -- hardly enough to make whole genome testing surpass the clinical utility threshold.
"Thus, our results suggest that genetic testing, at its best, will not be the dominant determinant of patient care and will not be a substitute for preventative medicine strategies incorporating routine checkups and risk management based on the history, physical status, and life-style of the patient."

[1] Nicholas J. Roberts, Joshua T. Vogelstein, Giovanni Parmigiani, Kenneth W. Kinzler, Bert Vogelstein1 and, & Victor E. Velculescu (2012). The Predictive Capacity of Personal Genome Sequencing Sci Transl Med 4, 133ra58 DOI: 10.1126/scitranslmed.3003380

ResearchBlogging.org



Tuesday, November 29, 2011

Sample size, P-values, and publication bias: the positive aspects of negative thinking


If you follow the science blogging community, you may have noticed a lot of talking about sample size in the past couple of weeks. So I did my share of mulling things over and this is what I came up with.

1- The study in question had a small sample size but reported a significant p-value (<0.05). Such study is NOT underpowered. An underpowered study is a study that does not have a sufficiently large sample size to allow detection of a significant result. A significant result is by definition a p-value less than 5%, which the study in question had. So, even though in general small sample size studies are indeed underpowered, that wasn't the issue in this particular case. In general, you are not likely to see many underpowered studies published (see point 5 below).

2- The issue with ANY small sample size study is the fact that you are not capturing the whole fluctuation in the population. And if you are not capturing the whole fluctuation, chances are, your error model is wrong, and a wrong error model leads to a wrong p-value. In other words, even if you do get a significant p-value, there's a question of whether or not that particular p-value is at all meaningful.

3- Why publish a study with a small sample size, then? Welcome to the life of a scientist. You set off with a grand plan, write a grant to sequence say 100 individuals, get the money to sequence 50, then you clean the data and end up with 30. Okay, those are made-up numbers, but you get the idea. So now you got your 30 sequences and you try to make the best out of them. You state all the caveats in the discussion section of your paper and advocate for further analyses and discuss future directions. If your paper gets published you have some leverage in your next grant, as in: "Look! I saw something with 30 sequences, which is clearly not enough, so now I'm applying to get money to sequence 100." Many scientific advances have ben made following exactly this route.

4- I've been talking a lot about p-values, but... What the heck is a p-value? A p-value of, say, 0.05 boils down to the following: if your results were completely random, and you were to repeat your experiment 100 times, you would observe your original result 5% of the time just out of pure chance. Suppose for example you want to see if a particular gene allele is associated with cancer. You do your experiment and come up with a p-value of 0.03. This means that if there really was no association whatsoever between the trait you measured and cancer, you would see your particular population distribution 3% of the time out of pure chance. Now, you see why anything above 5% is not significant: to observe something 10% of the time out of pure chance means that whatever you are trying to measure is a random effect. But to see it 3% of the time makes it rare enough that we are allowed to believe that there may be something in there after all. Notice that this is pretty much how science works. Many science outsiders think that "scientific" means "certain." Not true. Scientific means we can measure the uncertainty and when it's small enough we believe the result.

5- Now that we understand what p-values are we get to another issue: publication bias. Follow the logic: I just said that we start believing a result whenever the p-value is less than 5%. Basically, you can forget publishing anything that has a p-value above 5%. But, you won't know your p-value unless you do the experiment, and you won't publish unless you get a low p-value. Which means, you will never see all the similar studies that were carried out and yielded a high p-value. Suppose an experiment were repeated across different labs 100 times. Then, just by chance alone, 5% of these experiments yield a p-value of 5% or less. However, what you end up seeing in print are the experiments that yielded the "good" p-value, not the ones that yielded the negative results. As Dirnagl and Lauritzen put it [1],
"Only data that are available via publications‚ and, to a certain extent, via presentations at conferences‚ can contribute to progress in the life sciences. However, it has long been known that a strong publication bias exists, in particular against the publication of data that do not reproduce previously published material or that refute the investigators‚ initial hypothesis."
People address the issue with meta-analyses, in which several studies are examined and both positive and negative results are pooled together in order to estimate the "true" effects.
"In many cases effect sizes shrink dramatically, hinting at the fact that very often the literature represents the positive tip of an iceberg, whereas unpublished data loom below the surface. Such missing data would have the potential to have a significant impact on our pathophysiological understanding or treatment concepts."
A new movement is rising, which advocates the publication of negative results (i.e. results that did not substantiate the alternative hypothesis), and more journals are integrating this into either a "Negative Result" section or, as BioMed Central has done, even dedicating a journal to it, the Journal of Negative Results in Biomedicine.

I welcome and embrace the change in thinking. It's the same logic I advocate for mathematical models. My new motto: "Negative results? Bring them on!" Maybe I'll have a T-shirt made -- anyone want one too?

[1] Dirnagl, U., & Lauritzen, M. (2010). Fighting publication bias: introducing the Negative Results section Journal of Cerebral Blood Flow & Metabolism, 30 (7), 1263-1264 DOI: 10.1038/jcbfm.2010.51

ResearchBlogging.org

Wednesday, August 24, 2011

Intelligent people live longer... really?


I came across this abstract from a 2008 Nature essay:

Why do intelligent people live longer? 

We must discover why cognitive differences are related to morbidity and mortality in order to help tackle health inequalities.

The statistics show that children with high IQs tend to live longer than those with less intelligence. What the statistics don't tell us is why. What thing or things do intelligent people do that can delay mortality? Ian Deary explains how cognitive epidemiologists are trying to answer the question, and potentially contribute to the redistribution of health.

Prof. Deary is the director of the Centre for Cognitive Epidemiology of the University of Edinburgh. I never met him, but I looked up his research and what he does, and it is certainly impressive. The essay looks at a number of retrospective studies where the IQ was measured earlier in life, and the longevity of the subjects was measured. The article goes on trying to give possible explanations in order to, like the abstract says, "help tackle health inequalities."

The title seemed provocative enough to spark some discussion, so I thought I'd start by giving my two cents. I won't get into the whole issue of "how do we measure intelligence," as that is not my field (though I'd love to hear from experts). Instead, I tried to read the paper from a purely statistical point of view. And this is the part that got me puzzled:

"First, what occurs to many people as an obvious pathway of explanation, is that intelligence is associated with more education, and thereafter with more professional occupations that might place the person in healthier environments. Statistical adjustment for education and adult social class can make the association between early-life intelligence and mortality lessen or disappear."

You see, to a statistician that statement settles the argument. If correcting for education and social class makes the association disappear, then the association is spurious. Instead, the Nature essay deems it an "over-adjustment."

Whenever you are trying to fit a statistical model you have to make sure that your independent variables are truly independent. Example: suppose I take a population of ten Asians and ten Caucasians, follow them for forty years, and learn that after forty years all Caucasians are dead and all Asians are still alive. I might naively conclude that Asians live longer than Caucasians. Now suppose I tell you that eight out of ten Caucasians were smokers. Well, smoke turns out to be what statisticians call a confounding factor, in other words, a variable that's correlated to both the dependent and the independent variables. Not including it in the analysis leads to spurious relationships. In my made-up example, if I stratify my analysis between smokers and non-smokers and repeat the statistical test, this time I will find no significant difference in the longevity of Caucasians versus that of Asians.

In the case of intelligence and longevity, "income/social class" is an obvious confounding factor. We all know that a healthy lifestyle is expensive. A diet high in vitamins and fibers, the time to exercise, medications, regular medical check-ups: sadly, in today's world, they are all privileges for the well-off. Furthermore, the earlier you make healthy lifestyle choices, the better your odds later in life. So, income at birth also weighs in: children born in poor environments may not have access to vaccinations, medications, healthy foods, and other more general healthy lifestyle choices, all things that will affect them as adults. I may be missing something crucial, but it really doesn't seem like an over-adjustment to me.

In a way, we're saying the same thing. There is an association. But: is the association causal or is it masking an underlying, stronger association? The essay seems to suggest that the association is indeed indicative of something else, but somehow it leaves the question open-ended, concluding: "The things that people with higher intelligence have and do that makes them live longer may be found and, we hope, shared, towards the goal of better and more equal health."

Still, I would really like to see a similar study with education and social class folded in. Because if it turns out that the true underlying drive for a better and longer life is income, well, in that case I do have a suggestion to answer the above question: let's make health care and healthy life styles more affordable to people.

Picture: Colors at the Pike Place Market, Seattle. Canon 40D, focal length 70mm, exposure time 1/40.

Monday, July 25, 2011

The "eye test"


(Update: the next post in the "junk DNA" series will be up on Friday. Thanks to all of you who pitched in suggesting new topics, asking questions, and proposing guest blogs. We have a rich schedule coming up!)

Here's the checklist for a scientific paper:
  • you come up with a hypothesis; 
  • you design an experiment to test the hypothesis; 
  • you gather the data; 
  • you look at the data and decide whether or not your original hypothesis was correct.
No, wait, something's missing.

Oh, yeah. We forgot the analyst! Well, you're in luck, because that's exactly my job.

After they gather the data, the experimentalists show it to us, jumping up and down in excitement: "Look what we found!"
And we, the analysts, raise a brow, click our tongue, and reply: "Yeah, but can we prove it?"

So we design a new statistic, we write a code to implement it, we run, graph and debug until we've proven what the experimentalists saw in the first place. Or the opposite of that, it can go either way. Because the truth is, the human eye naturally looks for patterns. It's not objective. What you "see" is not always real. A good eye can help you make a conjecture, but then you have to prove your hypothesis. If you can't prove it, it's not real.

That's the core of scientific thinking.

Right?

Right.

So the other day we got the response on a paper we submitted for publication a few months ago. We had some data, which we summarized with a set of nice graphs, and then did some statistical analysis to prove the assert. The response? Rejected.

Turns out, the reviewer looked at our analysis, acknowledged the highly significant p-value (just so you know, a "highly significant p-value" means we proved our hypothesis), then stared at the data. He stared, stared, and stared and just couldn't see it. So he wrote: "The data doesn't pass the eye test."

Ahem.

May we suggest an eye doctor, kind Sir?


Picture: Ogunquit Beach, ME. Canon 40D, focal length 60mm, exposure time 1/800.