Most discussions about scientific replication focus on whether studies can be repeated; this piece asks a far more urgent question: when should we repeat them to get the most value? Stuart Buck engages with Jordan Dworkin's new modeling work to argue that the timing of a replication study is just as critical as the study itself, challenging the assumption that older, highly cited papers are past the point of no return.
The Timing Trap
The conversation begins with a provocative claim about the lifecycle of scientific attention. Dworkin suggests that because papers accumulate the majority of their citations in the first five years, the window to prevent a flawed study from spreading is narrow. Stuart Buck pushes back on this timeline, noting that influence often outlasts direct citation counts. He writes, "The mere fact that time has elapsed and direct citations have dropped doesn't necessarily mean that the paper is any less influential."
Buck illustrates this with the case of a 2006 Alzheimer's paper that was later exposed as fraudulent. Even fifteen years later, it was still cited hundreds of times a year, serving as a foundation for a vast subfield. The argument here is that a paper's "seminal status" can become so embedded that direct citations fail to capture its true reach. This is a crucial distinction for policymakers: if we only fund replications of new papers, we might miss the chance to correct errors that have quietly shaped decades of research. However, a counterargument worth considering is that by the time a paper is this entrenched, the cost of overturning it may be prohibitively high, regardless of the replication's quality.
The paper's seminal status is now so embedded in the field that direct citations aren't capturing anywhere near the amount of influence it has.
The Limits of Correction
Dworkin responds by refining the model, arguing that while older papers can still be valuable to replicate, the return on investment drops sharply as influence compounds. He suggests that once a paper has spawned hundreds of follow-on studies, a single failed replication may not be enough to change the field's trajectory. As Dworkin puts it, "Once there are hundreds of follow-on studies and a few subfields driven by a paper, un-ringing that bell takes far more than an independent replication."
This creates a paradox: the most influential papers are the hardest to correct. Dworkin notes that in the Alzheimer's case, it took "egregious and unambiguous fraud" layered with years of small failures to finally shift the consensus, not just a standard replication study. Buck acknowledges this, agreeing that if a finding is deeply baked into the literature, simply replicating the original paper might not be sufficient. Instead, he suggests we might need to replicate the "descendents"—the follow-on papers that are currently driving research. This reframing shifts the strategy from cleaning up the foundation to stabilizing the structure built upon it.
Engineering Attention
Perhaps the most actionable part of the exchange concerns the role of institutions like the National Institutes of Health. Both authors agree that the current system treats the community's response to replications as a fixed, low-impact variable. Buck argues this is a self-fulfilling prophecy. He writes, "A large-scale replication initiative... might not just fund replication studies and then let the citations settle out wherever they may."
Instead, Buck proposes active interventions: updating PubMed to prominently display replication results, working with major journals to link findings, and even penalizing grant applicants who ignore replication data. Dworkin agrees, calling the PubMed idea "clever and tractable." This moves the discussion from passive observation to active policy design. It suggests that the low impact of replications isn't an inevitable law of nature, but a failure of system design. Critics might note that such administrative changes could face resistance from a scientific culture that prioritizes novelty over verification, but the authors' consensus on the need for institutional leverage is a strong signal for future reform.
Bottom Line
The strongest part of this exchange is the realization that scientific correction is not a one-time event but a dynamic process where timing and institutional support determine success. The biggest vulnerability remains the practical difficulty of "un-ringing the bell" once a flawed idea has become a subfield's foundation. Readers should watch for whether agencies like the NIH adopt the proposed administrative levers to force the scientific community to pay attention to replication results.