← Back to Library

Statistical thinking in science: Crash course scientific thinking #2

Hank Green and the Crash Course team tackle a question that keeps many up at night: not just when we might die, but how to stop panicking about the numbers that try to tell us. The piece's most striking claim is that statistics are not lying to us; rather, we are failing to ask the right questions of the data we consume. In an era of headline-driven anxiety, this shift from blaming the numbers to blaming our interpretation is a vital, if uncomfortable, pivot.

The Illusion of the Average

Crash Course opens by dismantling the intuitive trust we place in the word "average." The author writes, "Statistics are vital for so much of what goes on around us... But statistics can be misleading. It's not because the numbers are lying. It's that if we don't understand how the numbers are being used, we might get the wrong impression about their meaning." This distinction is the entire engine of the argument. By separating the data from the context, Green exposes why a simple mean can be a trap. When discussing the average age of death for US men, the text notes that the mean is "dragged down by people who died way younger than 70, even though there are fewer of them." This skewing effect is a classic statistical pitfall that often goes unreported in news cycles.

Statistical thinking in science: Crash course scientific thinking #2

The commentary here is sharp because it moves quickly from the problem to the solution: looking at the median and mode. As Crash Course puts it, "The median is always the number directly in the middle of the data set. It is less likely to be skewed one way or the other the way a mean might be." This reframing is essential for busy readers who need to make quick judgments about risk. However, the piece acknowledges that even the median is not a crystal ball. The core of the argument is that we must understand the spread of the data, not just the center. "If the standard deviation is small, that tells me most people in this sample are dying at ages pretty close to the average age," Green explains. This is a crucial reminder that "typical" is a spectrum, not a single point. A counterargument worth considering is that for an individual, knowing the spread offers little comfort; the uncertainty remains personal and absolute regardless of the statistical distribution.

It is way better to be roughly right than precisely wrong.

The Trap of Relative Risk

The discussion takes a darker, more practical turn when addressing how media reports health risks. The text introduces a guest, Sage, to illustrate a common deception: the difference between relative and absolute risk. "With the old pill, 1 in 7,000 people were at risk of developing blood clots. With the new pill, the risk doubled," Crash Course writes. The immediate reaction is panic, but the reality is a shift from 1 in 7,000 to 2 in 7,000. The author argues that "that's just what scientists call the relative risk... which can be helpful to know, but it doesn't tell us the whole story." This is the piece's most actionable insight. By demanding the absolute risk, the reader is empowered to see that a "100% increase" can still be a negligible threat in real-world terms.

The commentary here is effective because it connects the math directly to human behavior. The text notes that when people only heard the relative risk, "people switched to less effective pregnancy prevention methods," inadvertently increasing their risk of blood clots because pregnancy itself is a higher risk factor. This is a powerful example of how statistical illiteracy can have lethal consequences. The argument holds up well, though it relies on the assumption that the audience will actually seek out the absolute numbers rather than just reacting to the headline. The piece wisely concludes this section by stating, "The more we understand numbers in context, the better we'll be at making informed decisions for our lives."

Correlation, Causation, and the Confounding Variable

Moving beyond single data points, the text tackles the relationship between variables. Green explains that a correlation is "a relationship between two or more variables which are basically anything that can be measured or counted." The classic example of ice cream sales and shark attacks is used to illustrate that "warm weather is indirectly leading to both." Here, the author introduces the concept of the confounding variable, defined as "a factor that influences the outcome of a study without being controlled for." This is where the argument becomes most sophisticated. It warns that even when scientists control for variables, we must ask, "is it possible this result was just a fluke in our data?"

Crash Course writes, "Statistical significance means the result is strong enough that it would be surprising to get by random chance." But the author immediately cautions, "In science, significant doesn't mean important." This distinction is often lost in public discourse, where a statistically significant finding is treated as a major breakthrough. The text clarifies that it simply means the result is unlikely to be random noise. This is a necessary correction to the hype cycle of scientific reporting. Critics might note that explaining "confounding variables" and "R values" to a general audience risks oversimplifying complex methodologies, but the use of the sunscreen example grounds the theory in tangible reality. The piece argues that understanding these nuances helps us "make more informed judgments about our own lives."

Bottom Line

The strongest part of this argument is its relentless focus on context over calculation; it teaches readers to question the framing of a number rather than the number itself. Its biggest vulnerability is the inherent difficulty of applying these statistical filters in the heat of a breaking news cycle. Readers should watch for the next episode, which promises to address how rare it truly is for a single experiment to shift scientific consensus.

Sources

Statistical thinking in science: Crash course scientific thinking #2

by Crash Course · Crash Course · Watch video

I am going to die eventually, which is pretty important to me personally. So, I'd like to know roughly at what age I am most likely to die. You might guess something like 70, which based on a national data set was the average age of death in the US for men who died between 2018 and 2023. But it might be that 79 is the more accurate answer, which is an extra 9 years.

So, how can I make sure I'm using the best number to answer my question? Can stats really tell me when I might die? And is there a way to look at these numbers and not have an existential crisis? Hi, I'm Hank Green and this is Crash Course Scientific Thinking.

Do not worry, I'm not going to teach you how to do statistics today. We have a whole other course about that. What we're talking about here is how to make sense of the stats you encounter in your everyday life. Statistics are vital for so much of what goes on around us, from designing video games to creating impactful government health policies.

But statistics can be misleading. It's not because the numbers are lying. It's that if we don't understand how the numbers are being used, we might get the wrong impression about their meaning. Scientists use statistics to understand data, but when they're looking at those numbers, they have all of the context that goes along with them.

By the time these stats are reported on in the news, they often lose some of that context, which can have big impacts on the ways that we see the world as consumers of science news. Scientists rely on numbers to build knowledge. But since they can't measure every person, they use samples, smaller groups they can measure to better understand a larger group, which means there's always some uncertainty. So, while stats could never tell me, Hank Green, exactly when I will die, they can tell me when a person like me is most likely to die.

So, what is the typical age of death for an American man? Well, when it comes to statistics, there's a few different ways of determining what's typical. One of the most common is to find the mean or average, the sum of all the numbers in a sample divided by how many numbers are in that ...