We hear about differences between groups every day. We hear that the average test score for one classroom of students is different from (or greater than) that of another classroom. We hear that the average financial return for one set of investments is different from that of another set. We hear that the average customer approval rating for one product is different from that of another product.
When people want to show us that two groups are different, based on a particular measure, sometimes they only show us the average (i.e. mean) values for the two groups. This is the case when we hear statements like, “The average test score for the first classroom is 700 while the average test score for the second classroom is only 650; therefore, the students in the two classrooms have different aptitudes. Or, the average financial return on our investments is 15 percent per year while the average return on those other investments is only 10 percent per year; therefore, our financial results are different (i.e. better) than their results.” Implicit in these statements is the assumption that the difference between the averages is meaningful (i.e. statistically significant).
Whether There’s a Difference Makes a Difference
There are many reasons why people might not present the calculation showing that the difference between the two groups is meaningful – They may not know the calculation exists. They may not have time for the calculation. They may not want to acknowledge that the difference they care about isn’t actually statistically meaningful.
Regardless of the reasons why someone might not initially present the evidence, if it’s important for you to know whether the specific difference is meaningful (e.g. if the difference will affect your decisions or actions) then it’s critical to know whether the difference is statistically significant.
Does Training Affect Performance Scores?
Let’s look at a quick example to see how you can evaluate whether the difference between the average values for any two groups is meaningful. Let’s say you manage a training program and want to know whether the average performance score for those who go through the training program is different from the average performance score for those who don’t go through the program.1
Fifteen people go through the training program. Fifteen other people do not go through the program. The average performance score for the group that goes through the training program is 86, and the variance of the scores is 10.32. The average performance score for the group that doesn’t go through the program is 83, and the variance of the scores is 13.84. Based on this information, would it be appropriate to say that the difference between the scores for the two groups is statistically significant? Take a moment. What do you think? Do you think you can say the 3-point difference between the two averages is statistically significant?
Additional Information for Additional Insight
As you probably guessed, we need a few additional pieces of information to make this determination. In addition to the 3-point difference between the averages, we need an estimate of the population variance, which we can compute as the average of the two group sample variances. In this case, the estimate of the variance is 12.1.2
Next, we need an estimate of the standard error of the difference between the averages. In this case, the standard error of the difference is 1.27.3 Next, we need to calculate the t-statistic for the difference between the averages. The t-statistic is 2.4.4
Finally, we need to calculate the probability, or p-value, of getting a t-statistic as large or larger than 2.4 and as small or smaller than -2.4. Using Excel’s T-Dist Function, we see that this p-value is 0.02.5 Since this p-value is less than 0.05, we can say that the difference between the scores for the two groups is statistically significant.6
Whether the Difference is Significant Isn’t an Insignificant Determination
As you can see, it is not enough to look only at the size of the difference between the averages to determine whether the difference is statistically significant. Doing so requires a few additional steps; however, you can carry out the additional steps relatively quickly and easily in any spreadsheet program.
Also, while the preceding example focused on a single area of application, the difference in performance scores between people who did and did not go through training, it’s important to recognize that there are many other situations in which it is important to determine whether the difference between average scores for two groups is statistically significant and meaningful.
By understanding that you need more than the two group averages alone to determine whether the difference between them is significant, and by being able to calculate the significance of the difference, you will be able to keep from rushing to unwarranted judgments, you’ll be able to determine when meaningful differences exist, and you’ll be able make informed decisions.
1 This example discusses a two-sided test for the difference between the means of two samples of equal size. If, as the manager of the program, you want to know whether the average score for those who go through the program is greater than the average score for those who don’t go through the program, then you would need to conduct a one-sided test (it’s only slightly different from the two-sided test). A slightly different calculation handles samples that are different sizes. I plan to discuss one-sided tests and tests between samples of different sizes in future articles.
2 MSE = (10.32 + 13.84) / 2 = 12.1
3 SGroup1-Group2 = Square root[2*(MSE)/n] =
Square root[2*(12.1)/15] =
Square root[1.61] = 1.27
4 t-statistic = 3 / 1.27 = 2.4
5 Degrees of freedom: (n1-1) + (n2-1) = (15-1) + (15-1) = 14 + 14 = 28
6 The p-value is the probability of obtaining a t-statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. In this case, the null hypothesis is that the difference between the two group averages is 0. Therefore, the p-value is the probability of obtaining a t-statistic as large or larger than 2.4 and as small or smaller than -2.4, assuming that the difference between the two group averages is 0.
Generally, we “reject the null hypothesis,” in favor of the alternative hypothesis (i.e. that there is a significant difference between the two group averages), when the p-value is less than 0.05, corresponding to a 5 percent chance of rejecting the null hypothesis when it is true.