On average, how long does it take you to get to work in the morning? 10 minutes? 30 minutes? Over 60 minutes? However long it takes you on average, because of many factors like traffic and weather conditions, I’m sure the length of time fluctuates from day to day. How much does it fluctuate? Does your travel time fluctuate within a narrow interval, such as a few minutes on either side of the average (i.e. mean), or does it display greater variability?

Unless you’ve recorded your travel times ever since you began working (highly unlikely!), you’re probably trying to answer these questions by thinking about the travel times of some of your most recent trips to work. In this case, you’re making inferences about a population (i.e. all of your trips to work) using information from a sample of that population (i.e. some of your most recent trips to work). Using sample information is helpful because it enables us to learn about a population without having to collect data from the entire population, which can be extremely difficult and expensive (consider how “fun” it would be to have to record your travel time to work every day).

**Understanding Standard Deviation**

Now that we know we’re using sample information, let’s get back to talking about the variability of your travel times. An important measure of the variability, or dispersion, of sample data is the sample standard deviation. The sample standard deviation is relatively easy to understand and interpret because it is in the same units as the data you’re evaluating. So if you’re evaluating data on travel time in minutes, then the sample standard deviation will also be in minutes. For example, let’s say it took you 25 minutes to get to work on Monday, 27 minutes on Tuesday, and 35 minutes on Wednesday. In this case, the average length of time it took you to get to work is 29 minutes^{1 }and the standard deviation for this sample of travel times is 5.3 minutes^{2}.

**Two Standard Deviation Rule of Thumb**

It is important for you to care about, calculate, and consider the sample standard deviation in addition to the average because it gives you a richer understanding of the sample you’re evaluating and enables you to use a simple but powerful rule of thumb to make inferences about the entire population.

According to the rule of thumb, there’s a 95 percent chance a normally distributed random variable will fall within two standard deviations (on either side) of its mean. This implies that if your average travel time is 29 minutes and the standard deviation is 5.3 minutes, then 95 percent of your travel times will be between 18.4 minutes and 39.6 minutes^{3}.

To get a feel for how powerful this rule of thumb is, think about all of the random variables you deal with every day. Budget figures. Investment returns. Service delivery times. By calculating the mean and standard deviation for a sample of data, you can quickly make informed inferences about the entire population of interest.

The sample standard deviation is a useful statistic that is wholly underutilized. By calculating and using standard deviations, you will have a better understanding of the nature of the data you’re evaluating than if you only look at averages. And by using the rule of thumb for standard deviations, you’ll be able to use sample information to quickly infer likely values for an entire population.

Take time to develop your ability to use standard deviations in addition to means. If you do, you’ll be able to complement your intuition with the new information and improve your decision making ability.

^{1} ([25 minutes + 27 minutes + 35 minutes] / 3) = 29 minutes

^{2} Square root(1/(n-1) * [(x_{1}-x-bar)^{^2} + (x_{2}-x-bar)^{^2} + (x_{3}-x-bar)^{^2}]) =

Square root(1/(3-1) * [(25-29)^{^2} + (27-29)^{^2} + (35-29)^{^2}]) =

Square root(0.5 * [16 + 4 + 36]) =

Square root(28) = 5.3 minutes

^{3} 29 minutes plus or minus (2 * 5.3 minutes) = 18.4 minutes and 39.6 minutes

Excellent description with a good example that is familiar to all. Simple but quick way for decision making with real data.

Is it possible to quantify the increase in the confidence level with a larger sample of data? What is the criteria used for judging the increased accuracy and benefit of the additional effort and cost of collecting a larger data set?

Thanks

Thank you for reading and commenting. Your comment touches on an important point about confidence intervals and sample size, which is – an increase in sample size will decrease the length of the confidence interval without reducing the level of confidence since the standard deviation decreases as the sample size increases. Also, it is a judgment call as to whether you need greater accuracy in your specific situation and whether you have the resources to afford the level of accuracy you desire.