Wikipedia Deep Dive

Meta-analysis

5 min read

In 1904, a statistician named Karl Pearson published a paper in the British Medical Journal that would quietly revolutionize how we understand knowledge itself. He gathered data from several clinical studies about typhoid inoculation—aggregating outcomes across multiple investigations to find patterns invisible in any single study. It was the first time anyone had systematically combined results from independent research projects to produce something greater than the sum of its parts. Nobody called it meta-analysis then. The term wouldn't be coined until 1976, when statistician Gene Glass defined it as "the analysis of analyses." But Pearson's work embodied exactly what meta-analysis would become: a statistical method for synthesizing quantitative data from multiple independent studies addressing a common research question.

The discipline has since grown from a niche statistical technique into one of the most influential tools in modern science. By 1991, only 334 meta-analyses had been published worldwide. By 2014, that number exploded to over 9,135—a thirtyfold increase in just over two decades. Meta-analyses now underpin treatment guidelines, shape health policies, and determine whether research grants get funded. They are the backbone of evidence-based medicine, the reason we can say with confidence that something works or doesn't work.

The Art of Synthesis

At its core, meta-analysis solves a fundamental problem: individual studies often contradict each other. One paper finds a treatment effective; another finds it useless. Policymakers, clinicians, and scientists need answers—not chaos. Meta-analysis resolves these uncertainties by computing a combined effect size across all relevant studies, improving statistical power and precision.

The process begins with data collection. Researchers identify appropriate keywords and search limits—using Boolean operators to craft precise queries—and scan databases like PubMed, Embase, and PsychInfo. They search reference lists of eligible studies, a technique called "snowballing," which can uncover relevant work missed by initial database searches. The initial search returns a large volume of studies; often, the abstract or title reveals a study doesn't fit pre-specified criteria.

These discards aren't arbitrary. Researchers document everything in a PRISMA flow diagram, detailing how many studies were returned, how many were discarded, and why. The date range of included studies must be specific enough that another researcher could reproduce the search. This rigor exists because meta-analyses serve as summaries guiding future research—they're expected to be reliable.

Measuring What Matters

For studies examining correlational data, effect size information is typically collected as Pearson's r statistic—a measure of the relationship between variables. Researchers must be careful: partial correlations, which remove certain variables from analysis, often inflate relationships compared to zero-order correlations. The partially-out variables will differ from study to study, so many meta-analyses exclude partial correlations entirely.

If scatterplots are available, plot digitizers can extract data points directly—a powerful tool for calculating effect sizes when reported statistics are incomplete.

Beyond the core effect size, researchers collect moderating variables: the mean age of participants, characteristics that might explain why effects differ across studies. A measure of study quality gets included to assess reliability—over eighty tools exist for assessing quality and risk of bias in observational studies, reflecting how different fields approach research differently.

These tools usually evaluate how dependent variables were measured, whether participant selection was appropriate, and whether confounding factors were adequately controlled. For correlational studies specifically, sample size, psychometric properties, and reporting methods become essential quality indicators.

The Gray Literature Question

Researchers face an additional decision: whether to include so-called gray literature—research not formally published in peer-reviewed journals. This includes conference abstracts, dissertations, and pre-prints.

Including gray literature reduces publication bias—the tendency for positive results to get published while negative findings disappear. But gray literature often (though not always) has lower methodological quality than formal publications. Conference proceedings, the most common source, are poorly reported; subsequent publications often differ from conference data, with inconsistencies observed in almost twenty percent of cases.

Two Types of Evidence

When performing a meta-analysis, researchers distinguish between individual participant data and aggregate data.

Aggregate data is more commonly available—it typically represents summary estimates like odds ratios or relative risks, found in published literature. This can be synthesized directly across conceptually similar studies using several approaches: fixed effects models, random effects models, or other statistical frameworks.

Individual participant data represents raw data as collected by study centers—actual measurements from each participant, not just summaries. This distinction matters because different evidence types require different methods.

One-stage methods model all IPD simultaneously while accounting for clustering of participants within studies. Two-stage methods first compute summary statistics for aggregate data from each study, then calculate overall statistics as a weighted average.

The trade-off is significant: reducing individual participant data to aggregate data allows two-stage methods to handle both evidence types, making them appealing for many meta-analyses. But recent research shows one-stage and two-stage methods can occasionally lead to different conclusions—sometimes dramatically different.

The Criticism Paradox

Despite its growth, meta-analysis faced serious backlash after Mary Lee Smith and Gene Glass published the first model meta-analysis in 1978 on psychotherapy outcomes. Psychologist Hans Eysenck responded aggressively, calling it "an exercise in mega-silliness" in a 1978 article. Later he described it as "statistical alchemy"—implying the method was something like turning lead into gold, i.e., impossible.

The criticism was fierce, but it didn't slow adoption. Meta-analysis expanded through the 1970s and now touches psychology, medicine, ecology, and countless other disciplines. Evidence synthesis communities have emerged to cross-pollinate ideas, methods, and software tools across fields—meta-analysis has become fundamental to metascience itself.

The Stakes

Meta-analyses are often (though not always) crucial components of systematic reviews—they summarize existing research to guide future studies. They support grant proposals, shape treatment guidelines, and influence health policies at national and international levels. When you read that a treatment works or doesn't work, meta-analysis is usually the reason.