Do you ever hear the word “average” used in a conversation and wonder which definition of the word “average” the speaker is (implicitly) using? Sure, the speaker is usually referring to the mean of a distribution, or more specifically the arithmetic mean, but there are many measures of central tendency, including geometric mean, harmonic mean, weighted mean, median, mode, and others, and it’s not always clear from the context of the conversation which measure of central tendency the speaker is using as the “average”.
And which “average” the speaker is using to describe a distribution matters because some versions, like the arithmetic mean, are influenced by outliers, observations that are very different from the rest of the data. If outliers are present in the data, then the “average” represented by the arithmetic mean may not be the most accurate description of the central tendency of the data. If you then naïvely make inferences or decisions based on an “average” derived from a dataset that contains outliers you may draw inaccurate conclusions, or at least conclusions you wouldn’t have drawn if you had used a different measure of central tendency.
One alternative measure of central tendency is the median. The median is the “average” in the sense that 50 percent of the distribution is greater than the median and 50 percent of the distribution is less than the median. To find the median of a distribution, you have to order the observations in the dataset from lowest to highest and then repeatedly remove the pair of highest and lowest observations until either one or two observations is left. If only one observation is left, it is the median. If two observations are left, the median is the arithmetic mean of the two remaining observations.
Let’s go through a short example to illustrate the difference between the arithmetic mean and the median of a distribution. Let’s say there are five homes in your cul-de-sac. Four of the homes have the same value: $100,000, $100,000, $100,000, and $100,000. The fifth home was recently built by a new owner and is worth $1,000,000. In this case, the arithmetic mean of the home values is ($100,000 + $100,000 +$100,000 +$100,000 + $1,000,000) / 5 = $280,000; whereas, the median home value is only $100,000, the observation in the middle of the ordered list of five home values. As you can see, the “average” value of the homes in your cul-de-sac is greater when you use the arithmetic mean because, unlike the median, it takes into account the significantly greater value of the fifth home. As a seller, which “average” would you consider more representative of the cul-de-sac? Which would you consider more representative as a buyer?
While the preceding example is fairly simple there are other situations, such as discussions about income distributions, investment returns, and customer service wait times, in which it is important to know which “average” is being used so you can gauge whether the specific “average” being used truly represents the data. By knowing which “average” is being presented and understanding whether it can be affected by outliers, you can avoid being misled. You will also be in a better position to make accurate inferences and decisions. So next time you hear people say the word “average” without providing a definition or context speak up and ask them what they mean.