Why Non-Correlations Tell Us More Than Correlations Do

Most readers here will know the problem with correlation and causation: post hoc ergo propter hoc (it came after therefore it was caused by). It’s one of the classic logical fallacies.

Suppose you find that people with good teeth have higher lifetime incomes. Are the good teeth the cause? Or is there some other factor that causes both (well-to-do families who can afford both orthodonture and good educations, or caring, attentive parents who are the types who take their kids to the dentist, etc.)

Researchers try to correct for this statistically by holding those other variables constant. “If all those kids had grown up with the same level of income, you’d still see higher lifetime income for those with good teeth.”

But there’s the rub: which possible “confounding variables” do you control for? There are an infinite number, probably including some or many that you haven’t even thought of. And the more variables you control for, the more statistically iffy and downright opaque the research becomes — getting more and more distanced from the basic facts on the ground. (It gets hard to see through and mentally internalize all that statistical legerdemain, so you can make judgments about how much to believe the research.)

And this doesn’t even count things like positive and negative feedback loops — perhaps with threshold levels and the like — between and among variables.

That myriad of potentially confounding variables and interactions means that — absent a convincing explanation of the causative effect — a +.7 or -.2 correlation could be completely spurious; it could be caused by variables or interaction effects that aren’t even being considered. (This is simplifying matters some, but the fact remains.)

But there is one huge exception to this: Zero.

Why is it so important? Because the massively overwhelming likelihood for any randomly chosen correlation is zero: between the position of Betelgeuse on people’s birth-days and their likelihood of divorce. Between the average page-count of english-language books and African malaria rates.

If you find a correlation of zero between two variables, there are two possibilities:

1. There is no correlation (and hence, almost certainly, no causative effect).

2. The effects of all of the confounding variables and interaction effects that you’re not considering combine and resolve out to … zero.

The odds of #2, in any complex system, are almost nonexistent. Which leaves #1 as the choice from Occam’s Razor.

Which is why I find the research on government size and economic growth so compelling. (Regular readers can stop yawning now. Newer readers can follow a few of the “Related Posts” below — and keep following them from there — and you’ll find plenty of that research.)

Whether you’re comparing prosperous countries over the last 30 to 90 years, or U.S. states over similar periods, looking at all different measures of growth and prosperity, the results are the same: little or no correlation between government size, taxing and spending levels, etc., and economic growth and prosperity.

To me, this is quite convincing. Since little or no correlation is found, see #1 above: there is little or no correlation, hence little or no causation.

Which means we should be concentrating our thinking and efforts on other things.

Update: I just came across this amusing set of pretty-much-random correlations, apparently auto-cherry-picked to show only those that are greater than .2 (20%). Causations anyone?


Update II – Nov. 20: Andrew Gelman, who knows more about statistics than … pretty much anyone, was nice enough to respond to an email query asking his opinion on this post. He does not find the argument at all persuasive. So take it for what you will.







4 responses to “Why Non-Correlations Tell Us More Than Correlations Do”

  1. Big Sis Avatar

    I remember a strong correlation that delighted me as a teenager. At the time at least, people use to argue that because X% of people who used heroin had started with marijuana, it meant that marijuana was a gateway drug that led to heroin use.

    However, as someone pointed out, 100% of heroin users had started on milk.

  2. Bill White Avatar
    Bill White

    Mike Kimel, author of the book Presimetrics, has pointed out several unexpected non-correlations. For example, GDP growth is not correlated with low taxes or with high taxes. We are told that raising taxes will slow the national economy. However, when you look at GDP growth during times of raised or lowered tax rates and also times of raised or lowered tax burden, you see almost no correlation. (GDP growth is slightly positively correlated with higher taxes, but not significantly.) This seems to be true even if you look correlate tax changes with GDP growth in subsequent years. That is to say, look at tax changes versus GDP growth k years later.

  3. Robert Johnson Avatar
    Robert Johnson

    “The odds of #2, in any complex system, are almost nonexistent.”

    Really? I would have thought differently. I would have thought that it is easy to swamp a weak correlation with noise from other inputs.

    As a crude example, maybe good teeth and high lifetime incomes are correlated but the correlation is hard to pick out from the noise of genetics, GDP of country of residence, culture and parenting, etc.

    Obviously any good researcher would attempt to control for these kinds of variables, but usually that’s not so easy to do. In fact, controlling for other variables is often done specifically to try to tease out a suspected correlation that IS lost in the noise presented by other variables.

    Another problem with finding no correlation is that you can’t be sure that you’ve chosen the correct length of time. Over a longer or shorter term a correlation may appear.

    I think this is the great weakness in the social sciences – for complex systems correlation doesn’t conclusively prove anything, and lack of correlation doesn’t either.

  4. […] that it’s a difficult issue. (Though I give a tentative and perhaps only somewhat useful response here.) But what concerns me most is what I discern to be the unstated implication of his […]