Imagine you’re a 40 year old woman who recently visited the doctor for a routine breast cancer screening. During your visit, the doctor informed you that only one percent of 40 year old women who get screened have breast cancer. The doctor also reminded you that the screening is not perfectly accurate – that is, only 80 percent of women with breast cancer will receive positive results (i.e. cancer is present) and 10 percent of women without breast cancer will also receive positive results. You just received your test results, and they’re positive. What is the probability you actually have breast cancer?

Take a moment and think about it. What do you think? If you’re like most people, you probably think the probability you have breast cancer is around 80 percent; maybe slightly less than that, but still a frighteningly high probability of cancer. If your guess was around 80 percent then you probably reasoned something like: 80 percent of women with breast cancer get positive results, I got a positive result, therefore the probability I have breast cancer is around 80 percent. However, there are two problems with this reasoning that make your estimate of the probability you have breast cancer far too high.

The first problem is that you’ve forgotten to take into account the base rate, a.k.a. prior or *a priori*, probability of breast cancer among women your age. As your doctor mentioned, it is one percent, which means only 1 out of 100 women who get screened have breast cancer. Therefore, before you learn the results of your test the probability you have breast cancer is one percent. When you receive your positive test results you probably want to adjust your estimate of the probability you have breast cancer upward, but since the test isn’t perfect you shouldn’t increase it from one percent to 80 percent. In just a moment we’ll discuss how you can use probability theory to adjust the base rate using the new information you received from your test result, but first let’s discuss the second problem with your earlier reasoning.

The second problem is that you’ve confused two very different probabilities. One probability is the probability you receive a positive test result given that you have breast cancer. You can write this probability as P(+T|C), which can be read as the probability of a positive test result (+T) given the presence of breast cancer (C). This probability, stated above as the accuracy of the screening test, is 80 percent.

The other probability is the probability you have breast cancer given that you receive a positive test result. You can write this probability as P(C|+T). Notice how the +T and C switched places. This probability, which is very different from the first, is the one you want to figure out, so let’s do that now.

The information you already have is:

P(C) = 0.01 The prior probability of breast cancer among women your age is one percent.

P(+T|C) = 0.80 The probability you receive a positive test result given that you have breast cancer is 80 percent.

P(+T|NC) = 0.10 The probability you receive a positive test result given that you do not have breast cancer is 10 percent.

You actually know more than this. Since you know P(C) you also know P(NC), the prior probability that you do not have cancer. You know this because together the two probabilities must sum to one. Since P(C) is 0.01, P(NC) equals 1 – 0.01 = 0.99, or P(NC) = 0.99. Using the same reasoning, you also know the probabilities P(-T|C) and P(-T|NC), which are 0.20 and 0.90, respectively.

Finally, remember that you can use Bayes’ Theorem to update a prior probability with new information. To do so, use the following Bayes’ equation (I am using symbols specific to the problem we’re discussing, but you can use any symbols so long as you use them consistently):

P(C|+T) = [ P(C) * P(+T|C) ] / P(+T)

The denominator, P(+T), is equal to P(C) * P(+T|C) + P(NC) * P(+T|NC)

In words, this equation says the probability you have breast cancer given that you received a positive test result is equal to the prior probability of breast cancer among women your age times the probability you receive a positive test result given that you have breast cancer divided by the probability of receiving a positive test result.

Armed with all of this information, you’re now ready to calculate P(C|+T), the probability you have breast cancer given that you received a positive test result. This probability is:

P(C|+T) = [ P(C) * P(+T|C) ] / [ P(C) * P(+T|C) + P(NC) * P(+T|NC) ]

P(C|+T) = [ 0.01 * 0.80 ] / [ 0.01 * 0.80 + 0.99 * 0.10 ]

P(C|+T) = 0.008 / (0.008 + 0.099)

P(C|+T) = 0.008 / 0.107

P(C|+T) = 0.075, or 7.5 percent

By using Bayes’ equation you now know that, given the positive test result, the probability you have breast cancer has increased from one percent to 7.5 percent. Of course, any increase in the probability of breast cancer is unsettling, but at least the probability hasn’t increased to 80 percent.

The preceding is but one example of how you can use Bayes’ equation to update a prior probability with additional information. There are so many more – the probability an employee will be highly productive given that he receives a specific score on an interview test, the probability a company’s stock will reach a specific level given that it receives a specific risk rating, the probability a student will excel in school given that she scores well on a test – the list could go on and on. I hope you find more fun and interesting ways to learn and use the power of Bayes.

I enjoyed all the posts but thought this one especially interesting because the fallacies you discuss are often there in articles by consumer and other non-scientific groups looking at the increased chance (say) of having a bad reaction to a test drug or a (usually) drug already on the market. I think the false reasoning goes that if the chance that a person would get cancer is 1 in 100 and taking a drug appears to increase the chance by another 30%, the reports will often talk about a 30% greater chance of getting cancer, forgetting to note that the original chance was 1 in 100, so that in absolute terms the number of persons actually getting cancer would be 1.30 (I think I have that right).

Absolutely. Your point is incredibly important. People need to receive both the initial rate and the change to be able to understand and make inferences from the information.

It would be interesting to know how often it’s the case the authors of the articles you mention don’t understand the point we’re discussing, and so miss the opportunity to discuss and teach others about meaningful statistics, versus how often the authors deliberately withhold data or carefully choose which data to present in order to advocate their position or sensationalize a point.

I’ll try to touch on related topics in the future so we can continue to discuss the use and interpretation of statistics in the areas of policy making and regulation. If you come upon any interesting applications please let me know. Thanks so much for reading and commenting.