In a policy landscape obsessed with quick fixes for educational inequality, Freddie deBoer delivers a devastating takedown of the most influential economic argument of the last decade. He challenges the widely accepted notion that swapping out a small percentage of teachers can unlock hundreds of thousands of dollars in lifetime earnings for students, exposing the statistical fragility behind the data. This is not just an academic dispute; it is a critique of a movement that has shaped federal policy and state laws based on what deBoer calls a "neoliberal just-so story."
The Illusion of Precision
DeBoer begins by dismantling the core premise of Raj Chetty's research: the idea that "value-added models" (VAMs) can isolate a teacher's impact from the complex web of a student's life. He points out that the methodology is circular by design. "In his famous 2014 American Economic Review papers, he and his coauthors reported that students assigned to better... teachers were more likely to attend college, earn higher salaries... and that replacing a teacher in the bottom five percent... would raise the present value of a single classroom's lifetime earnings by roughly $250,000." These figures were so compelling they were cited by the Obama administration and used by a judge to strike down teacher tenure laws in Vergara v. California, a case that relied heavily on the assumption that tenure protected ineffective educators.
However, deBoer argues that the metric itself is a statistical trap. "It's not a measure of pedagogical skill, content knowledge, classroom climate, the cultivation of curiosity, or any other property normally meant by 'good teaching.' It's a statistical residual on a narrow set of assessments." By defining the unexplained variance in test scores as "teacher quality," the model assumes the conclusion it seeks to prove. This framing is effective because it strips away the veneer of scientific objectivity, revealing that the policy is built on a logical fallacy rather than empirical reality.
"Any portion of variability in student outcomes that Chetty et al cannot or will not identify otherwise is assumed to be a product of teacher inputs."
The stakes of this methodological flaw are high. If the metric is merely a placeholder for unknown variables, then policies designed to fire teachers based on these scores are not only ineffective but potentially destructive. DeBoer notes that even the literature itself admits teachers control a tiny fraction of student outcomes, yet the narrative has been expanded by pundits to suggest teachers are the dominant force in a child's academic life.
The Instability of the Metric
The argument deepens when deBoer examines the reliability of these scores over time. If teacher quality were a stable trait, a teacher's score should remain relatively consistent from year to year. The data suggests otherwise. DeBoer cites a 2015 review showing that year-to-year correlations for value-added estimates range from 0.18 to 0.64. "At the low end, that's noise. In human research, in educational research? That's noise!" He highlights the absurdity of a system where a teacher can be ranked as one of the worst in the district one year and one of the best the next, without any change in their actual performance.
This volatility is not just a statistical curiosity; it undermines the fairness of any employment decision based on these numbers. DeBoer points out that the results shift depending on the statistical model used or the demographic composition of the class. "Crucially, teachers whose students were less advantaged systematically received lower effectiveness ratings than the same teachers did when teaching more advantaged students... even with statistical controls supposedly accounting for student background." This suggests the model penalizes educators working in high-need communities, reinforcing the very inequities the policy claims to solve.
Critics might argue that while the models are imperfect, they still offer a better tool than subjective principal evaluations. However, deBoer counters that a construct that is "mostly noise" cannot serve as a foundation for high-stakes decisions, regardless of whether it is better than the alternative. The inconsistency renders the data useless for identifying a "bottom five percent" of teachers.
When Theory Meets Reality
The most damning evidence deBoer presents comes from the real-world application of these models. He details the experience of the Houston Independent School District, which used a proprietary algorithm to evaluate and terminate teachers. The result was chaos. "Results bounced around, teachers beloved by parents and students received poor scores, school administrators felt that the outcomes were fickle and their position undermined." The situation escalated to a federal lawsuit, Houston Federation of Teachers v. HISD, where a court ruled that the algorithm was a "black box" that violated due process because it could not be audited or replicated.
This legal defeat underscores the fundamental flaw in the Chetty school of thought: a system that cannot be explained or challenged is not a measurement. DeBoer writes, "A construct that cannot be reproduced, challenged, or transparently explained to the people being measured is not, in any meaningful operational sense, a measurement." The fact that a federal court intervened to stop the use of these scores in employment decisions validates the theoretical criticisms with concrete legal consequences.
Furthermore, deBoer addresses the scientific defense of the research: the quasi-experiment involving teacher switching. He notes that an independent researcher, Jessie Rothstein, attempted to replicate the findings using North Carolina data and found that the identifying assumption failed. "Once Rothstein adjusted for this, she found moderate bias in VA scores... and reported that the long-run earnings and college-attendance estimates are sensitive to control choices and cannot support strong conclusions." This replication failure strikes at the heart of the causal claims, suggesting that the massive economic benefits attributed to teacher quality may be statistical artifacts.
"The bottom-five-percent teacher whose dismissal would supposedly net $250,000 per classroom is largely a statistical artifact."
Bottom Line
Freddie deBoer's critique is a necessary correction to a policy narrative that has dominated education reform for a decade, exposing how a seductive statistical story can override nuance and fairness. The argument's greatest strength lies in its synthesis of theoretical flaws, statistical instability, and real-world legal failures, creating a cohesive case against value-added models. However, the piece leaves the reader with a difficult question: if teacher quality cannot be measured this way, and systemic factors account for most variance, what is the viable path forward for improving educational equity? The answer remains elusive, but deBoer has successfully cleared the ground of a dangerous illusion.