How Significant Is Our Relationship? I Don’t Think It’s a Game Anymore

A few days ago, I played a board game called Loaded Questions with three family members.  Everyone’s game pieces start at one end of a color-coded multi-block path on the board, and the object of the game is to get to the other end of the path before anyone else.

You begin by rolling a die and moving your game piece forward as many spaces as are showing on the face of the die.  Then you select one of the game’s numerous question cards and read one of the four questions on it.  Which question you read depends on the color of the space on which you landed.

Your opponents write their responses on sheets of paper, and then one of them reads the responses to you (while keeping the responses hidden from view).  The challenge is to guess who wrote which response.  For each correct guess, you get to move forward one additional space.  Therefore, if you want to win the game it behooves you to make it difficult for your opponents by submitting responses that they wouldn’t associate with you.

Diligently Detecting Deceptive Dishonesty

The four of us played the game several times, so we submitted dozens and dozens of responses.  Some of my responses were truthful and some of them were not (remember, I wanted to win the game, so I didn’t want my family members to be able to guess which responses were mine).  Despite my best attempts to deceive my wife (and other opponents), she was still able to correctly guess which responses were mine more frequently than I thought probable (she knows me too well!).

I am intrigued by my wife’s apparent “supernatural” ability to correctly guess my responses, so I have decided to use the chi-square test, a test of association between categorical variables, to assess whether the relationship between ‘I told the truth’ and ‘My wife guessed correctly’ is statistically significant.

Tell Me Chi-square, Can I Categorically Deny the Relationship?

For a chi-square test, it is common to organize the data into a table.  The following table displays the data for my analysis:

I Told the Truth
My Wife Guessed Correctly Yes No Total
Yes 25 15 40
No 9 27 36
Total 34 42 76

I submitted a total of 76 responses, 34 were truthful and 42 were not.  Out of the 34 responses in which I told the truth, my wife correctly guessed my response 25 times.  Out of the 42 responses in which I did not tell the truth, my wife correctly guessed my response 15 times.  In all, my wife correctly guessed my response 40 out of 76 times.  Is there a statistically significant association between the two categorical variables?  What do you think?

Image Source: cas.bellarmine.edu

The chi-square test uses the observed number of observations in each cell, and the expected number of observations in each cell, to assess whether the relationship between the two variables is statistically significant.  The expected number of observations in a cell is calculated as follows: (Row Total / Grand Total) * Column Total.  In this case, the expected number of observation in the first cell of Row 1, i.e. Yes/Yes, is (40 / 76) * 34 = 18 (rounded to the nearest whole number).  The expected values for the remaining cells are calculated in the same fashion.

The chi-square statistic is calculated using the tables of observed and expected observations as follows: Sum[(Observed – Expected)2 / Expected].  Using the data in the two tables, the chi-square statistic for my analysis is 10.8.1  The degrees of freedom is 1.2  For the test to be significant at the 0.05 level, given 1 degree of freedom, the value for the chi-square statistic has to be at least 3.8.3  Since 10.8 is greater than 3.8, we can reject the null hypothesis of no association between the two variables.

Image Source: astroved.com

Testing the Strength of Many Relationships with Chi-square

While I used it in a fun, simple example in this article, the chi-square test is an important technique for assessing whether an association between two categorical variables is statistically significant.  Think about all of the situations in which we want to learn critical information by comparing categorical variables: Is receipt of treatment related to mortality?  Is receipt of in-kind transfers related to employment?  Is receipt of resources related to project success?

By using the chi-square test, you’ll be able to identify relationships between variables in need of additional analysis, assess the significance of relationships presented to you, and focus attention on the relationships between variables that are meaningful to the issue at hand.

Notes:

1 Chi-square statistic: Sum[(Observed – Expected)2 / Expected] =

[(25 – 18)2/18] + [(9 – 16)2/16] + [(15 – 22)2/22] + [(27 – 20)2/20] =

2.8 + 3.1 + 2.3 + 2.5 = 10.8 (decimal may be slightly different because of rounding)

2 Degrees of freedom: (number of rows – 1)*(number of columns – 1) = (2 – 1)*(2 – 1) = 1*1 = 1

3 The following table shows chi-square statistic values that must be reached, given the level of significance desired and the degrees of freedom, in order to reject the null hypothesis of no association: http://www.medcalc.org/manual/chi-square-table.php

Just Because We’re In a Relationship Doesn’t Mean I’m the Cause of Your Behavior

When you receive information about events that occur over time do you look for patterns or relationships in the information?  For example, when you see a tall couple walking down the street do you think their children will be tall?  When you learn that a new children’s toy is becoming popular do you expect the toy’s price to increase?  When you hear that someone is highly educated do you assume the person is also wealthy?

When we receive and process information, we frequently look for meaningful relationships in the information.  It’s usually helpful to do so because we learn about the individual events and develop knowledge we can use to make predictions.  At the same time, it is important to remember that even if two events are related it doesn’t mean that one necessarily causes the other.

Are These Two Variables Correlated?

Because of our desire to know whether two events are related, and if so, how closely, we have developed methods for measuring the direction and strength of the relationship between phenomena.  One such method is the Pearson correlation coefficient, which measures the degree to which there is a linear relationship between two variables.  The Pearson correlation coefficient ranges from +1 to –1.

If the correlation coefficient is close to +1 there is a strong positive relationship, meaning that as one variable increases the other tends to increase as well.  If the correlation coefficient is close to –1 there is a strong negative relationship, meaning that as one variable increases the other tends to decrease.  Finally, a correlation coefficient close to zero signals the lack of a linear relationship between the two variables.

Image Source: sgspsychology.webs . com

Perception of Government Quality and Willingness to Pay Taxes

Let’s say you’re interested in understanding the relationship between how highly your community rates its local government and your community’s willingness to pay additional taxes.  You’ve collected survey data from the community for the past five years.  In each of those years, the overall rating for the local government has been 91, 87, 83, 92, and 89, respectively (scores range from 0 to 100, where 0 is awful and 100 is wonderful).

At the same time, your community’s willingness to pay additional taxes has been 62, 10, 55, 63, and 91, respectively (scores range from 0 to 100, where 0 is unwilling to pay additional taxes under any circumstances and 100 is completely willing to pay additional taxes).  Given these data, what can you say about the relationship between how highly your community rates its local government and your community’s willingness to pay additional taxes?

The Pearson correlation coefficient will provide you with the direction and strength of the linear relationship between these two variables.  In this case, the correlation coefficient is 0.31*, so you can say there is a weak positive relationship between how highly your community rates its local government and your community’s willingness to pay additional taxes (remember, these data are made up).  The weak positive relationship means that as your community’s rating of the local government increases, to a weak extent, your community’s willingness to pay additional taxes also increases.

Correlation Does Not Imply Causation

An important point to remember is that the correlation coefficient only provides information about the direction and strength of the linear relationship between two variables.  It does not provide information about a non-linear relationship between two variables and it does not imply that one variable causes the other.  Sometimes we assume, or jump to the conclusion, that because two variables are correlated one necessarily causes the other, but this is not always the case and is an improper assumption to make.  Remember, correlation does not imply causation.

Image Source: http://www.zazzle . com

Guard Against Assumptions of Causality

Whether at home or at work, we are frequently taking in and processing information.  We are often trying to identify meaningful patterns and relationships in the data so we can understand what we’re interpreting and improve our ability to make predictions with the data.  The Pearson correlation coefficient is an important tool you can use to measure the relationship between two variables because it provides you with the direction and strength of the linear relationship between the variables.  While the correlation coefficient is a useful measure of association, it is important to remember that correlation does not imply causation.  Guard against the urge to assume, or be easily persuaded, that a causal relationship exists simply because two events are correlated.  By doing so, you will reduce the likelihood of making an unfounded (and perhaps costly) assumption that one variable causes another and increase your chances of making informed, defensible decisions.

*Pearson correlation coefficient (r):

r = [n*(Sum xy) – (Sum x)(Sum y)] / square root([n*Sum x2 – (Sum x)2][n*Sum y2 – (Sum y)2])

where:

n = number of pairs of scores

Sum xy = sum of the product of paired scores

Sum x = sum of x scores

Sum y = sum of y scores

Sum x2 = sum of squared x scores

Sum y2 = sum of squared y scores

In the example:

n = 5

Sum xy = 24,972

Sum x = 442

Sum y = 281

Sum x2 = 39,124

Sum y2 = 19,219

Therefore, the Pearson correlation coefficient, r, equals:

r = [5(24,972) – (442)(281)] / square root([5(39,124) – (442*442)][5(19,219) – (281*281)])

r = (124,860 – 124,202) / square root([195,620 – 195,364][96,095 – 78,961])

r = 658 / square root([256][17,134])

r = 658 / square root(4,386,304)

r = 658 / 2,094

r = 0.31

In this case, the correlation for the made up data is 0.31, which indicates a weak positive relationship between how highly a community rates its local government and the community’s willingness to pay additional taxes.