← Back to Library

Raj chetty's Just-So stories

In a policy landscape obsessed with quick fixes for educational inequality, Freddie deBoer delivers a devastating takedown of the most influential economic argument of the last decade. He challenges the widely accepted notion that swapping out a small percentage of teachers can unlock hundreds of thousands of dollars in lifetime earnings for students, exposing the statistical fragility behind the data. This is not just an academic dispute; it is a critique of a movement that has shaped federal policy and state laws based on what deBoer calls a "neoliberal just-so story."

The Illusion of Precision

DeBoer begins by dismantling the core premise of Raj Chetty's research: the idea that "value-added models" (VAMs) can isolate a teacher's impact from the complex web of a student's life. He points out that the methodology is circular by design. "In his famous 2014 American Economic Review papers, he and his coauthors reported that students assigned to better... teachers were more likely to attend college, earn higher salaries... and that replacing a teacher in the bottom five percent... would raise the present value of a single classroom's lifetime earnings by roughly $250,000." These figures were so compelling they were cited by the Obama administration and used by a judge to strike down teacher tenure laws in Vergara v. California, a case that relied heavily on the assumption that tenure protected ineffective educators.

Raj chetty's Just-So stories

However, deBoer argues that the metric itself is a statistical trap. "It's not a measure of pedagogical skill, content knowledge, classroom climate, the cultivation of curiosity, or any other property normally meant by 'good teaching.' It's a statistical residual on a narrow set of assessments." By defining the unexplained variance in test scores as "teacher quality," the model assumes the conclusion it seeks to prove. This framing is effective because it strips away the veneer of scientific objectivity, revealing that the policy is built on a logical fallacy rather than empirical reality.

"Any portion of variability in student outcomes that Chetty et al cannot or will not identify otherwise is assumed to be a product of teacher inputs."

The stakes of this methodological flaw are high. If the metric is merely a placeholder for unknown variables, then policies designed to fire teachers based on these scores are not only ineffective but potentially destructive. DeBoer notes that even the literature itself admits teachers control a tiny fraction of student outcomes, yet the narrative has been expanded by pundits to suggest teachers are the dominant force in a child's academic life.

The Instability of the Metric

The argument deepens when deBoer examines the reliability of these scores over time. If teacher quality were a stable trait, a teacher's score should remain relatively consistent from year to year. The data suggests otherwise. DeBoer cites a 2015 review showing that year-to-year correlations for value-added estimates range from 0.18 to 0.64. "At the low end, that's noise. In human research, in educational research? That's noise!" He highlights the absurdity of a system where a teacher can be ranked as one of the worst in the district one year and one of the best the next, without any change in their actual performance.

This volatility is not just a statistical curiosity; it undermines the fairness of any employment decision based on these numbers. DeBoer points out that the results shift depending on the statistical model used or the demographic composition of the class. "Crucially, teachers whose students were less advantaged systematically received lower effectiveness ratings than the same teachers did when teaching more advantaged students... even with statistical controls supposedly accounting for student background." This suggests the model penalizes educators working in high-need communities, reinforcing the very inequities the policy claims to solve.

Critics might argue that while the models are imperfect, they still offer a better tool than subjective principal evaluations. However, deBoer counters that a construct that is "mostly noise" cannot serve as a foundation for high-stakes decisions, regardless of whether it is better than the alternative. The inconsistency renders the data useless for identifying a "bottom five percent" of teachers.

When Theory Meets Reality

The most damning evidence deBoer presents comes from the real-world application of these models. He details the experience of the Houston Independent School District, which used a proprietary algorithm to evaluate and terminate teachers. The result was chaos. "Results bounced around, teachers beloved by parents and students received poor scores, school administrators felt that the outcomes were fickle and their position undermined." The situation escalated to a federal lawsuit, Houston Federation of Teachers v. HISD, where a court ruled that the algorithm was a "black box" that violated due process because it could not be audited or replicated.

This legal defeat underscores the fundamental flaw in the Chetty school of thought: a system that cannot be explained or challenged is not a measurement. DeBoer writes, "A construct that cannot be reproduced, challenged, or transparently explained to the people being measured is not, in any meaningful operational sense, a measurement." The fact that a federal court intervened to stop the use of these scores in employment decisions validates the theoretical criticisms with concrete legal consequences.

Furthermore, deBoer addresses the scientific defense of the research: the quasi-experiment involving teacher switching. He notes that an independent researcher, Jessie Rothstein, attempted to replicate the findings using North Carolina data and found that the identifying assumption failed. "Once Rothstein adjusted for this, she found moderate bias in VA scores... and reported that the long-run earnings and college-attendance estimates are sensitive to control choices and cannot support strong conclusions." This replication failure strikes at the heart of the causal claims, suggesting that the massive economic benefits attributed to teacher quality may be statistical artifacts.

"The bottom-five-percent teacher whose dismissal would supposedly net $250,000 per classroom is largely a statistical artifact."

Bottom Line

Freddie deBoer's critique is a necessary correction to a policy narrative that has dominated education reform for a decade, exposing how a seductive statistical story can override nuance and fairness. The argument's greatest strength lies in its synthesis of theoretical flaws, statistical instability, and real-world legal failures, creating a cohesive case against value-added models. However, the piece leaves the reader with a difficult question: if teacher quality cannot be measured this way, and systemic factors account for most variance, what is the viable path forward for improving educational equity? The answer remains elusive, but deBoer has successfully cleared the ground of a dangerous illusion.

Deep Dives

Explore these related deep dives:

  • The Death and Life of the Great American School System Amazon · Better World Books by Diane Ravitch

  • Vergara v. California

    This 2014 California court case relied directly on Chetty's value-added research to strike down teacher tenure laws, illustrating the real-world policy impact of the disputed data.

  • Just-so story

    The author uses this term to characterize Chetty's narrative as a convenient but scientifically unproven fable that simplifies complex educational outcomes into a single cause.

  • Value-added modeling

    This article explains the specific statistical methodology Chetty uses to isolate teacher effects, revealing how the 'residual variation' the author critiques is calculated and why it often fails to account for non-random student assignment.

Sources

Raj chetty's Just-So stories

by Freddie deBoer · · Read full article

For a long time I’ve been getting some version of the comment, “What about Chetty!” in response to my perspective on education, as in Raj Chetty, the economist who for the past decade has made a lot of waves asserting that our education problems are straightforwardly the product of bad teachers and that replacing them will have implausibly large economic effects. I tend to try and work from a broader perspective than “this is why I think this guy is wrong,” but I get this request so often, here you go. This is why I think Raj Chetty is wrong.

Few empirical claims in modern education policy have traveled farther than Chetty et al’s findings on teacher “value-added.” In his famous 2014 American Economic Review papers, he and his coauthors reported that students assigned to better (excuse me, higher value-added) teachers were more likely to attend college, earn higher salaries, save for retirement, and avoid teen pregnancy, and that replacing a teacher in the bottom five percent of the distribution with an average teacher would raise the present value of a single classroom’s lifetime earnings by roughly $250,000. Chetty’s research findings in this domain had been floating around for awhile at the time of publication, and President Obama cited the figure in his 2012 State of the Union address, and the judge who decided Vergara v. California leaned on it to strike down California’s teacher tenure laws. Take that, teachers! The findings are arresting, the dataset is impressive - 2.5 million children, linked to IRS tax records! - and the policy implications are clean: identify and remove bad (pardon me, low “value-added”) teachers, watch outcomes improve. It’s exactly the kind of story our neoliberal policy establishment is desperate to tell, and was clearly catnip to the Obama administration, which was doggedly attached to a simplistic vision of delivery through better education, where the gutting of the uneducated labor market was ameliorated by turning every last child in the United States into a genius, scaling up the Stanford-to-Google pipeline until every American could pass through it.

Unfortunately, the Chetty story is ultimately another neoliberal just-so story, that is to say, a fable, a legend, a myth. The closer you look at what the “value-added” construct actually measures, how stable those measurements are, and how the Chetty results have fared under replication, the more reason there is to doubt both the magnitude ...