Both correlation and causation are statistical terms that are often misunderstood. Understanding both of these concepts is vital before making a decision based on data.

Above is a prime example of how the use of these two concepts can go wrong. Data from the time period 2000 to 2009 for the divorce rate in Maine, and per capita consumption of margarine are almost perfectly correlated. It would be easy for someone to make the conclusion that one should cause the other, i.e. as people were no longer getting divorced in Maine, people across the country limited their consumption of margarine. This sounds like a far fetched theory, and there is most likely another reason why the consumption of margarine is decreasing, for example an increased importance on health over the past decade.

What is Correlation?

Correlation is defined as a measure of the linear relationship between two quantifiable variables. These variables could for example be height and weight, where the taller you are, the more you weigh (on average).

There are then two types of correlations; negative and positive. A positive correlation simply means that the two variables are moving in the same direction (e.g. our height and weight example). Negative correlation implies that our variables are diverging, moving away from each other. An example of a negative correlation relationship could be air condition expenses as weather outside gets colder. You are less likely to run your air condition when the weather outside is cold.

What is Causation?

Causation, or causality, is the capacity of one variable to influence another. Causation would therefor occur when one event occurred because the first event occurred. An example that is often used;

Smoking cigarettes causes lung cancer.

Correlation vs Causation

Sometimes, when you find a correlation in your data, it can be difficult to tell if A caused B while other times it may be obvious. When a relationship is found, ALWAYS consider other underlying causes before drawing your conclusion on whether or not A did in fact cause B.

I have half a decade of experience working with data science and data engineering in a variety of fields both professionally and in academia. I ahve demonstrated advanced skills in developing machine learning algorithms, econometric models, intuitive visualizations and reporting dashboards in order to communicate data and technical terminology in an easy to understand manner for clients of varying backgrounds.

Write A Comment