The Correlation Fallacy

Did you know that the divorce rate in Maine and the per capita consumption of margarine are related.

It’s true.

Whenever one goes up, so does the other. When one goes down, same thing.

So, is margarine consumption a result of divorces in Maine? Do the prospects of court deliberations, split assets and alimony have Mainers running to the store for come Country Crock with their lobster dinner?

Not necessarily.

The Maine divorce rate-margarine consumption is a prime example of the adage correlation is not causation.

In other words, even if two things appear to be alike, they might not be related at all.

We’ve heard this time and again. Yet we continue to search for correlations, seemingly everywhere.

This has as much to do with innovation as anything.

With the growth of technology and the proliferation of big data sets, we have more raw records to peruse than ever before. More than we know what to do with.

There is no guidebook for turning this data into intelligible information. No rinse-and-repeat process to transform the data at hand into knowledge and solutions to make the world a better place.

With no roadmap to follow, we try to find needles in haystacks. We dive into the data, trying to find whatever relationships we can.

On the surface, this seems innocent enough. And it would be — if we were robots. Or Spock.

But we’re not.

We’re humans. Hot blooded, emotion-driven and filled with inherent biases.

A search for meaning is at the heart of our actions. We’re hard wired for this quest.

So, a simple dive through terabytes of data is actually a complex treasure hunt for causality. The objective: Find relationships that support our assertions and complete our narratives.

Instead of panning for gold, we’re data mining for affirmations. We’re finding whatever ammunition we can to support five words: I’m right and you’re wrong.

Those words are subjective. But with more access to data than ever before, we feel we have license to treat them as objective. Even if we must violate the correlation fallacy to do so.

This is how we end up with a world of alternative facts. A world of filter bubbles, chronic mistrust and divisiveness.

All because we refuse to abide by the rules of data assessment.


The world of statistics is filled with obscure names. While the dawning of America made the names Washington, Jefferson and Franklin renowned, fewer people know of Bayes, Boole, Pearson and Box.

The difference is as unsurprising as it is stark. One group of historic figures addressed its audience as We the People and spoke of Life, Liberty and the Pursuit of Happiness. The other group came up with hypotheses and then rejected — or failed to reject — them using math.

One group did work that was invigorating and captivating. (Heck, they even made Broadway hip-hop musicals about it.) The other did work that was arcane and ambiguous.

It’s no surprise that we’re drawn to the narrative of the Founding Fathers over that of the Fathers of Statistics. The underdog story of how the United States came to be has spawned centuries of free enterprise, free speech and freedom to pursue the American dream. The story of statistics has left us running regressions in Excel and figuring out how Z-scores work on a normal distribution.

Yet, ideas and ideals can only get us so far. While it’s a blessing to live in a free society, it’s also true that hopes, dreams and $3 can get us a cup of coffee at Starbucks.

In order to thrive, we must be able to quantify our impact. Use of data is critical.

This is why the government has a Census every 10 years. It’s why companies and investors track their stock market performance. It’s why we monitor the number of steps we take when we exercise.

We are effectively data-driven. Particularly when something is up for debate.

When we need answers quick, there are few resources to turn to that are more universal than numbers. The strategy is simple: Pull the right data. Win the argument. Seize the day.

Yet, in our zeal to make data our Excalibur, we forget one key point. Statistics are not set up to be definitive.

On the contrary, they’re intentionally ambiguous.

There are too many strange factors out there — from freak occurrences to that which we cannot explain — for us to confidently say that a set of statistical equations can explain the whole world around us. It’s just not true.

The best we can do is point out which factors are related to — or correlated with — other factors. And then use that knowledge to make our arguments.

When we do this, time after time, we say we’re letting the numbers speak.

But the numbers are not speaking. Our inherent bias is.

By looking to settle a debate, we dive into the numbers with a narrative in mind. The correlations and relationships we find are those that either fulfill our narrative or reframe it in a way that still paints it in a positive light.

This is sleazy enough when it comes to matters of opinion. (Hence the issues with the filter bubble society we live in.) But it’s downright reckless when it comes to matters of healthcare treatment, financial wellness, security and public policy.

The decisions we affect in these areas have wide ranging implications. Whether our role is that of an industry professional, a politician, a journalist, a civic voter or something else, a subjective set of correlation analyses won’t cut it.

Yet, time and again, that’s what key decisions are made on. And we suffer the consequences, whether we notice them or not.


It’s time we break with this destructive pattern.

It’s time we stop treating statistics as our white horse, and correlations as our armor.

It’s time that we get some common sense.

When making key decisions, key arguments and key points, let us do more than hold blindly to the data.

Let us open our eyes and consider what’s going on in the world around us.

Let us consider opposing viewpoints, and how they might be valid.

Let us treat learning as discovery, not validation.

It’s only when we do all that that the data speak in volumes. It’s only when we do all this that the resulting decisions bring the most good.

Statistics are a powerful tool, but a delicate one.

Handle with care.