Comparing climate models with observations

In an era of record-breaking heat, the most urgent question isn't whether the planet is warming, but whether our best tools for predicting the future are finally catching up to reality. Andrew Dessler and Zeke Hausfather tackle this head-on, dismantling the simple narrative that models are either "right" or "wrong" by revealing a complex, generational evolution in how we simulate the climate system. Their analysis suggests that while the latest generation of models may have overestimated warming in the short term, the most recent decade of data is pushing observations toward the higher end of the spectrum, forcing a reckoning with how much of this acceleration is human-driven versus a temporary atmospheric spike.

The Generational Shift

Dessler and Hausfather begin by tracing the lineage of climate modeling, moving from the Coupled Model Intercomparison Project 3 (CMIP3) through CMIP5 to the current CMIP6. They note that earlier generations, specifically CMIP3 and CMIP5, "better reproduce the rate of warming observed since 1970," aligning closely with historical data. However, the narrative shifts with the arrival of CMIP6, which accompanied the IPCC's Sixth Assessment Report. Here, the authors point out a significant divergence: "CMIP6 shows considerably higher warming compared with the post-1970 observed trend, with the multi-model mean warming around 25% faster than observations over the past 55 years."

Comparing climate models with observations

This discrepancy stems from a subset of models in the latest generation that feature a much higher climate sensitivity—meaning they predict more warming for a given increase in carbon dioxide—than the scientific consensus deems likely. Dessler and Hausfather explain that "due to the disproportionate amount of 'hot models' in CMIP6, the IPCC AR6 report chose to develop its own Assessed Warming Projections" to stay within realistic bounds. They even reference their own 2022 work in Nature, where they proposed filtering these models to match the IPCC's more conservative projections while retaining the broader ensemble. This historical context is crucial; just as the Special Report on Emissions Scenarios (SRES) shaped the scenarios used in CMIP3, the shift to Shared Socioeconomic Pathways (SSPs) in CMIP6 reflects a more nuanced understanding of future human behavior, even if the physical response in the models has become more volatile.

"While a bit unsatisfying as an answer, it probably remains too early to tell."

The authors are careful not to declare a definitive victory for the "hot models." They argue that the jury is still out on whether the recent acceleration is a permanent shift or a statistical fluke driven by internal variability. Critics might argue that dismissing the high-end models too quickly ignores the possibility that the physics of cloud feedbacks are indeed more sensitive than previously thought, a risk that could leave policymakers underprepared. However, Dessler and Hausfather maintain a disciplined approach, insisting that we must distinguish between the signal of human-driven warming and the noise of natural cycles.

Decoding the Noise

To cut through the confusion of long-term averages, the commentary pivots to a more granular analysis of time periods. Dessler and Hausfather write, "One way around this problem is to look at trends in climate models and observations over multiple time periods," specifically examining 56-year, 25-year, and 15-year windows. The data reveals a striking pattern: while the 15-year trend from 2011 to 2025 places observed warming "well outside of the range of any CMIP3 models and all but one CMIP5 model," it sits comfortably on the higher side of the CMIP6 range.

This finding is particularly illuminating when contrasted with the so-called "hiatus" of the early 2000s. The authors remind readers that "the divergence between observations and the CMIP5 models featured in Fyfe et al is actually greater in the recent period than it was back in 2012 – albeit with an acceleration rather than a hiatus." They caution against overinterpreting these short-term spikes, noting that "it is quite possible – I'd even say quite likely – that a portion of the 0.4C per decade warming over the past 15 years is attributable to internal climate variability." The shift from a strong La Niña in 2011 to a moderately strong El Niño in 2024 provides a natural explanation for part of the surge, independent of long-term emissions trends.

"Observed warming trends start to be on the high end of the model range after 2000 or so, and are outside the 95th percentile of CMIP5 models over the past 18 years."

By visualizing trends across all possible start dates—a method they call "picking the whole cherry tree"—Dessler and Hausfather show that observations have consistently moved toward the upper bounds of model predictions since the early 2000s. This approach avoids the cherry-picking of specific start dates that often plagues climate debates. Yet, the authors admit that "trends in warming may be more useful for comparing models and observations than spaghetti plots of warming over time, but they are still quite sensitive to the choice of start and end dates." This sensitivity is a reminder that while the data is robust, the interpretation requires a nuanced understanding of statistical variance.

The Bottom Line

Dessler and Hausfather provide a masterclass in scientific humility, refusing to let the latest heatwaves dictate a premature conclusion about model failure or success. Their strongest argument lies in the distinction between short-term variability and long-term forcing, suggesting that while the "hot models" of CMIP6 may have been too aggressive in the past, recent data is forcing a convergence toward their higher estimates. The biggest vulnerability in this analysis, however, is the inherent uncertainty of the next decade; if the current acceleration proves to be the new normal rather than a temporary spike, the filtered models may underestimate the urgency of the crisis. Readers should watch for the next few years of data, which will determine whether the world is warming at the ~0.2C per decade of the past or the ~0.3C per decade suggested by recent evidence.

Comparing climate models with observations

by Andrew Dessler & Zeke Hausfather · The Climate Brink · Read full article

The extreme global temperatures of the past few years have led a lot of people to ask me if the world is warming faster than expected.

To answer that, we need to look at how well climate models reproduce observed global mean surface temperatures. Here I will look at the last three generations of climate models (CMIP3, CMIP5, and CMIP6) as well as a version of the latest generation of models (CMIP6) that excludes the so-called “hot models” whose climate sensitivity is higher than the range deemed likely in the most recent IPCC report.

It turns out that that the resulting picture is complex. Earlier generations of models better reproduce the rate of warming observed since 1970, while the latest generation better captures the rate of warming in the last two decades. Whether this is evidence that warming is occurring faster than earlier generations of climate models expected will depend on how much of the recent warming acceleration is here to stay, and how much is being driven by short-term climate variability. While a bit unsatisfying as an answer, it probably remains too early to tell.

This post represents an update of my 2023 TCB post on model-observation comparisons, albeit with a fair bit of new analysis included (and two more years of observations!).

Comparing warming over time.

One classic way to compare models and observations is to look at how temperatures have changed over time, compared to the multi-model mean and 5th to 95th percentile range across individual climate model runs. This is generally done using a single run for each unique climate model (in cases where modeling groups submit more than one modeling run) in order to ensure that each gets equal weight in the analysis.

For example, here are the 23 unique CMIP3 climate models that accompanies the IPCC 4th Assessment Report published in 2007. These models were run around 2004, and use historical data on CO2 concentrations, volcanoes, and other climate forcings through the year 2000 (and the SRES scenarios thereafter, with the middle-of-the-road A1B scenario shown here).

These are compared to observations from the Berkeley Earth surface temperature product on a monthly basis, and generally show quite a good agreement over the past century, with observations close to the multi-model mean except for a brief period in the 1900s and 1940s.

We can zoom into the more recent post-1970 period when the bulk of warming ...

Comparing climate models with observations

The Generational Shift

Decoding the Noise

The Bottom Line

Deep Dives

Sources

Comparing climate models with observations