When you receive information about events that occur over time do you look for patterns or relationships in the information? For example, when you see a tall couple walking down the street do you think their children will be tall? When you learn that a new children’s toy is becoming popular do you expect the toy’s price to increase? When you hear that someone is highly educated do you assume the person is also wealthy?
When we receive and process information, we frequently look for meaningful relationships in the information. It’s usually helpful to do so because we learn about the individual events and develop knowledge we can use to make predictions. At the same time, it is important to remember that even if two events are related it doesn’t mean that one necessarily causes the other.
Are These Two Variables Correlated?
Because of our desire to know whether two events are related, and if so, how closely, we have developed methods for measuring the direction and strength of the relationship between phenomena. One such method is the Pearson correlation coefficient, which measures the degree to which there is a linear relationship between two variables. The Pearson correlation coefficient ranges from +1 to –1.
If the correlation coefficient is close to +1 there is a strong positive relationship, meaning that as one variable increases the other tends to increase as well. If the correlation coefficient is close to –1 there is a strong negative relationship, meaning that as one variable increases the other tends to decrease. Finally, a correlation coefficient close to zero signals the lack of a linear relationship between the two variables.
Perception of Government Quality and Willingness to Pay Taxes
Let’s say you’re interested in understanding the relationship between how highly your community rates its local government and your community’s willingness to pay additional taxes. You’ve collected survey data from the community for the past five years. In each of those years, the overall rating for the local government has been 91, 87, 83, 92, and 89, respectively (scores range from 0 to 100, where 0 is awful and 100 is wonderful).
At the same time, your community’s willingness to pay additional taxes has been 62, 10, 55, 63, and 91, respectively (scores range from 0 to 100, where 0 is unwilling to pay additional taxes under any circumstances and 100 is completely willing to pay additional taxes). Given these data, what can you say about the relationship between how highly your community rates its local government and your community’s willingness to pay additional taxes?
The Pearson correlation coefficient will provide you with the direction and strength of the linear relationship between these two variables. In this case, the correlation coefficient is 0.31*, so you can say there is a weak positive relationship between how highly your community rates its local government and your community’s willingness to pay additional taxes (remember, these data are made up). The weak positive relationship means that as your community’s rating of the local government increases, to a weak extent, your community’s willingness to pay additional taxes also increases.
Correlation Does Not Imply Causation
An important point to remember is that the correlation coefficient only provides information about the direction and strength of the linear relationship between two variables. It does not provide information about a non-linear relationship between two variables and it does not imply that one variable causes the other. Sometimes we assume, or jump to the conclusion, that because two variables are correlated one necessarily causes the other, but this is not always the case and is an improper assumption to make. Remember, correlation does not imply causation.
Guard Against Assumptions of Causality
Whether at home or at work, we are frequently taking in and processing information. We are often trying to identify meaningful patterns and relationships in the data so we can understand what we’re interpreting and improve our ability to make predictions with the data. The Pearson correlation coefficient is an important tool you can use to measure the relationship between two variables because it provides you with the direction and strength of the linear relationship between the variables. While the correlation coefficient is a useful measure of association, it is important to remember that correlation does not imply causation. Guard against the urge to assume, or be easily persuaded, that a causal relationship exists simply because two events are correlated. By doing so, you will reduce the likelihood of making an unfounded (and perhaps costly) assumption that one variable causes another and increase your chances of making informed, defensible decisions.
*Pearson correlation coefficient (r):
r = [n*(Sum xy) – (Sum x)(Sum y)] / square root([n*Sum x2 – (Sum x)2][n*Sum y2 – (Sum y)2])
n = number of pairs of scores
Sum xy = sum of the product of paired scores
Sum x = sum of x scores
Sum y = sum of y scores
Sum x2 = sum of squared x scores
Sum y2 = sum of squared y scores
In the example:
n = 5
Sum xy = 24,972
Sum x = 442
Sum y = 281
Sum x2 = 39,124
Sum y2 = 19,219
Therefore, the Pearson correlation coefficient, r, equals:
r = [5(24,972) – (442)(281)] / square root([5(39,124) – (442*442)][5(19,219) – (281*281)])
r = (124,860 – 124,202) / square root([195,620 – 195,364][96,095 – 78,961])
r = 658 / square root([17,134])
r = 658 / square root(4,386,304)
r = 658 / 2,094
r = 0.31
In this case, the correlation for the made up data is 0.31, which indicates a weak positive relationship between how highly a community rates its local government and the community’s willingness to pay additional taxes.