Back and forth on the value of replication

Most discussions about scientific replication focus on whether studies can be repeated; this piece asks a far more urgent question: when should we repeat them to get the most value? Stuart Buck engages with Jordan Dworkin's new modeling work to argue that the timing of a replication study is just as critical as the study itself, challenging the assumption that older, highly cited papers are past the point of no return.

The Timing Trap

The conversation begins with a provocative claim about the lifecycle of scientific attention. Dworkin suggests that because papers accumulate the majority of their citations in the first five years, the window to prevent a flawed study from spreading is narrow. Stuart Buck pushes back on this timeline, noting that influence often outlasts direct citation counts. He writes, "The mere fact that time has elapsed and direct citations have dropped doesn't necessarily mean that the paper is any less influential."

Back and forth on the value of replication

Buck illustrates this with the case of a 2006 Alzheimer's paper that was later exposed as fraudulent. Even fifteen years later, it was still cited hundreds of times a year, serving as a foundation for a vast subfield. The argument here is that a paper's "seminal status" can become so embedded that direct citations fail to capture its true reach. This is a crucial distinction for policymakers: if we only fund replications of new papers, we might miss the chance to correct errors that have quietly shaped decades of research. However, a counterargument worth considering is that by the time a paper is this entrenched, the cost of overturning it may be prohibitively high, regardless of the replication's quality.

The paper's seminal status is now so embedded in the field that direct citations aren't capturing anywhere near the amount of influence it has.

The Limits of Correction

Dworkin responds by refining the model, arguing that while older papers can still be valuable to replicate, the return on investment drops sharply as influence compounds. He suggests that once a paper has spawned hundreds of follow-on studies, a single failed replication may not be enough to change the field's trajectory. As Dworkin puts it, "Once there are hundreds of follow-on studies and a few subfields driven by a paper, un-ringing that bell takes far more than an independent replication."

This creates a paradox: the most influential papers are the hardest to correct. Dworkin notes that in the Alzheimer's case, it took "egregious and unambiguous fraud" layered with years of small failures to finally shift the consensus, not just a standard replication study. Buck acknowledges this, agreeing that if a finding is deeply baked into the literature, simply replicating the original paper might not be sufficient. Instead, he suggests we might need to replicate the "descendents"—the follow-on papers that are currently driving research. This reframing shifts the strategy from cleaning up the foundation to stabilizing the structure built upon it.

Engineering Attention

Perhaps the most actionable part of the exchange concerns the role of institutions like the National Institutes of Health. Both authors agree that the current system treats the community's response to replications as a fixed, low-impact variable. Buck argues this is a self-fulfilling prophecy. He writes, "A large-scale replication initiative... might not just fund replication studies and then let the citations settle out wherever they may."

Instead, Buck proposes active interventions: updating PubMed to prominently display replication results, working with major journals to link findings, and even penalizing grant applicants who ignore replication data. Dworkin agrees, calling the PubMed idea "clever and tractable." This moves the discussion from passive observation to active policy design. It suggests that the low impact of replications isn't an inevitable law of nature, but a failure of system design. Critics might note that such administrative changes could face resistance from a scientific culture that prioritizes novelty over verification, but the authors' consensus on the need for institutional leverage is a strong signal for future reform.

Bottom Line

The strongest part of this exchange is the realization that scientific correction is not a one-time event but a dynamic process where timing and institutional support determine success. The biggest vulnerability remains the practical difficulty of "un-ringing the bell" once a flawed idea has become a subfield's foundation. Readers should watch for whether agencies like the NIH adopt the proposed administrative levers to force the scientific community to pay attention to replication results.

Back and forth on the value of replication

by Stuart Buck · · Read full article

From Stuart:.

My friend Jordan Dworkin recently wrote an excellent piece titled, “How Much Should We Spend on Scientific Replication?” It is the first attempt to model the probability that funding replication studies will be more impactful than just funding new scientific studies.

I saw the piece ahead of time, of course, but after doing a little more rumination, I have a couple of thoughts that I should have offered earlier.

First, Dworkin says that we should prioritize replicating new studies, based on the typical pattern of how citations accumulate:

First is when the replication happens — the attention a paper receives, and the extent to which it drives follow-on research, are time-dependent. Papers tend to accrue ~2.5% of their total citations in the first year after publication, 7.5% in the second, and ~12% each in years 3-5; by the sixth year post-publication, the average paper has already received almost half of the direct attention it ever will. Because the capacity of a failed replication to reduce a study’s impact depends on when the replication occurs, we should not prioritize replicating studies that have already accrued a lot of citations — at that point, it may be too late to capture most of the replication’s value. Instead, we should aim to replicate papers that are likely to accrue many citations in the future.

I’m not sure I agree with this assumption that the value of replication is mainly in heading off citations early on in a paper’s lifetime.

The famous 2006 Alzheimer’s paper that turned out to be likely fraudulent was cited over 3,600 times according to Google Scholar. 414 of those citations occurred since 2021, starting 15 years after the paper was published. Here’s the historical pattern (and keep in mind that mid-2022 was when the potential fraud was uncovered):

I think it’s fair to say this was still an influential paper, 15+ years after publication, and the full extent of its influence wasn’t just in the direct citations but in all the follow-on papers and grants, many of which might not have cited the 2006 paper directly:

I think there’s something to the following stylized model of how a lot of science operates:

Paper A comes out in Year 1. It gets a fair bit of attention early on, because it does something new/surprising/useful. Papers B, C, and D then build upon it in the next year or ...

The Timing Trap

The Limits of Correction

Engineering Attention

Bottom Line

Sources

Back and forth on the value of replication