Effect Size vs Statistical Significance

What makes a clinical study significant?

What is statistical significance?

One of the most confusing aspects of a clinical trial is how researchers can say with certainty that the results they found are truly due to the treatment and not to random chance. This distinction is determined by statistical significance. Significance is a complex (and highly controversial) topic, but most studies set significance at .05. This means that there is less than a 5% possibility that the results were due to chance and not the treatment. In many studies, this figure falls below 1%, indicating that there is strong evidence that the treatment legitimately produced the result.

What is the effect size?

It's easy to think of results as a dichotomy. Total cure or complete failure. But that's not how researchers look at the effects of treatments. A treatment can be effective at reducing blood pressure without curing hypertension. It's possible–and even common– for clinical trials to find that a substance consistently reduces blood pressure by 2-3 points. When looking at whether or not something is effective, it's important to also consider the size of the effect–or the effect size. The effect size tells us how large or small the effect actually is.

Combined, these two topics mean that it is entirely possible to see a study that found that the treatment group had far better results than the placebo group, yet the researchers conclude that there is no evidence that the substance works. How can there be no evidence that the substance works?!? The study clearly shows that the treatment group is better? The answer is found in statistical significance. When evaluating other possible explanations for the difference in outcomes, the researchers found that the evidence does not clearly show the treatment to be responsible for the effect. That's not to say that the researchers are ignoring the difference in the two groups, rather that the difference, according to the study in question, appears to be due to some other cause, possibly even random chance.

When this happens, it is not the same thing as producing evidence that something does not work. This could be caused by a treatment that just does not work. It could also be caused by the study's design not being a good match for the way in which the treatment works. Perhaps the dose was too small or the application method was all wrong. Perhaps the effect is smaller than believed and therefore the sample size wasn't large enough to confirm the effects. Perhaps this study is just a fluke. There are many reasons a study may not confirm that something works and when this happens, researchers often continue investigating to see if other approaches produce a different outcome.

Statistically significant yet practically meaningless?

Similarly, a study might find that statistical significance does exist yet the treatment is still not recommended. This is also a cause for head-scratching when interpreting scientific studies. How can researchers say that the effect was significant and then turn around and say that the treatment is not recommended? This is where the effect size comes in. Remember, the effect size reveals the strength of the treatment. It tells us whether the treatment will reduce blood pressure by 2-3 points or by 20-25 points.

This usually happens when a clinical trial is too large. We dispel the "bigger is better" myth elsewhere on this site. The takeaway here is that when too many people are recruited into a study, just about anything can become statistically significant, even tiny effects on the body. Clearly the goal of taking a medicinal treatment or dietary supplement is not to produce tiny effects; the goal is usually to maintain health or help the body return to health. If someone has hypertension, a 2-3 point reduction in blood pressure isn't going to accomplish that goal. So when you read or hear that a study found that a product was effective, be sure to follow through and find the size of the effect.

Can you have both?

The best studies are designed in a way that, if the treatment really does work, the study will find both statistically significant and meaningful results. Properly designed studies should not miss reasonably sized effects due to being too small, and similarly, they should not produce effect sizes that are statistically significant but useless to the average person.

The key is to design a study that is the right size. There is no perfect size to a study. Sometimes the ideal size is 30-40 people. Sometimes it may be 200. In certain types of studies, it may even be 5,000 or more. Researchers are trained to conduct multiple analyses during the design phase of a study to determine that sweet spot–not too big, not too small. This extra work during the design phase leads to more useful results at the end.

Meet Dr Hawkins

Dr. Hawkins brings 20 years of expertise in the integrative health field to her role as Executive Director of the Franklin School of Integrative Health Sciences and the leader of our clinical research team.

She holds a Bachelor’s Degree in Environmental Health from Union Institute and University, a Master’s Degree in Health Education & Promotion from the University of Alabama, a post-graduate certificate in epidemiology from the London School of Hygiene and Tropical Medicine, a PhD in Health Research from Middle Tennessee State University, and is completing the post-doctoral Global Scholars Research Training Program at Harvard Medical School. She also holds certifications in numerous natural health fields including aromatherapy, aromatic medicine, herbalism, childbirth education, and labor support.