A few days ago, I played a board game called Loaded Questions with three family members. Everyone’s game pieces start at one end of a color-coded multi-block path on the board, and the object of the game is to get to the other end of the path before anyone else.
You begin by rolling a die and moving your game piece forward as many spaces as are showing on the face of the die. Then you select one of the game’s numerous question cards and read one of the four questions on it. Which question you read depends on the color of the space on which you landed.
Your opponents write their responses on sheets of paper, and then one of them reads the responses to you (while keeping the responses hidden from view). The challenge is to guess who wrote which response. For each correct guess, you get to move forward one additional space. Therefore, if you want to win the game it behooves you to make it difficult for your opponents by submitting responses that they wouldn’t associate with you.
Diligently Detecting Deceptive Dishonesty
The four of us played the game several times, so we submitted dozens and dozens of responses. Some of my responses were truthful and some of them were not (remember, I wanted to win the game, so I didn’t want my family members to be able to guess which responses were mine). Despite my best attempts to deceive my wife (and other opponents), she was still able to correctly guess which responses were mine more frequently than I thought probable (she knows me too well!).
I am intrigued by my wife’s apparent “supernatural” ability to correctly guess my responses, so I have decided to use the chi-square test, a test of association between categorical variables, to assess whether the relationship between ‘I told the truth’ and ‘My wife guessed correctly’ is statistically significant.
Tell Me Chi-square, Can I Categorically Deny the Relationship?
For a chi-square test, it is common to organize the data into a table. The following table displays the data for my analysis:
|I Told the Truth|
I submitted a total of 76 responses, 34 were truthful and 42 were not. Out of the 34 responses in which I told the truth, my wife correctly guessed my response 25 times. Out of the 42 responses in which I did not tell the truth, my wife correctly guessed my response 15 times. In all, my wife correctly guessed my response 40 out of 76 times. Is there a statistically significant association between the two categorical variables? What do you think?
The chi-square test uses the observed number of observations in each cell, and the expected number of observations in each cell, to assess whether the relationship between the two variables is statistically significant. The expected number of observations in a cell is calculated as follows: (Row Total / Grand Total) * Column Total. In this case, the expected number of observation in the first cell of Row 1, i.e. Yes/Yes, is (40 / 76) * 34 = 18 (rounded to the nearest whole number). The expected values for the remaining cells are calculated in the same fashion.
The chi-square statistic is calculated using the tables of observed and expected observations as follows: Sum[(Observed – Expected)2 / Expected]. Using the data in the two tables, the chi-square statistic for my analysis is 10.8.1 The degrees of freedom is 1.2 For the test to be significant at the 0.05 level, given 1 degree of freedom, the value for the chi-square statistic has to be at least 3.8.3 Since 10.8 is greater than 3.8, we can reject the null hypothesis of no association between the two variables.
Testing the Strength of Many Relationships with Chi-square
While I used it in a fun, simple example in this article, the chi-square test is an important technique for assessing whether an association between two categorical variables is statistically significant. Think about all of the situations in which we want to learn critical information by comparing categorical variables: Is receipt of treatment related to mortality? Is receipt of in-kind transfers related to employment? Is receipt of resources related to project success?
By using the chi-square test, you’ll be able to identify relationships between variables in need of additional analysis, assess the significance of relationships presented to you, and focus attention on the relationships between variables that are meaningful to the issue at hand.
1 Chi-square statistic: Sum[(Observed – Expected)2 / Expected] =
[(25 – 18)2/18] + [(9 – 16)2/16] + [(15 – 22)2/22] + [(27 – 20)2/20] =
2.8 + 3.1 + 2.3 + 2.5 = 10.8 (decimal may be slightly different because of rounding)
2 Degrees of freedom: (number of rows – 1)*(number of columns – 1) = (2 – 1)*(2 – 1) = 1*1 = 1
3 The following table shows chi-square statistic values that must be reached, given the level of significance desired and the degrees of freedom, in order to reject the null hypothesis of no association: http://www.medcalc.org/manual/chi-square-table.php