Import AI 455: Automating AI research

Jack Clark doesn't just predict the future of artificial intelligence; he argues we are standing on the precipice of a moment where the very people building these systems become obsolete. His latest analysis for Import AI presents a startling, data-driven case that fully automated AI research is not a distant sci-fi trope, but a likely reality by 2028. For busy leaders tracking the pace of technological disruption, this is not abstract theory—it is a forecast backed by a mosaic of benchmark scores that suggest the engine of scientific discovery is about to switch from human to machine.

The Coding Singularity

Clark anchors his argument in the rapid acceleration of AI's ability to write and manage code, the fundamental substrate of modern software. He points to SWE-Bench, a rigorous test of an AI's ability to solve real-world GitHub issues, to illustrate the speed of this shift. "When SWE-Bench launched in late 2023 the best score at the time was Claude 2 which had an overall success rate of ~2%," Clark writes. "Claude Mythos Preview gets 93.9%, effectively saturating the benchmark." This isn't a marginal improvement; it is a saturation point that suggests the engineering bottleneck is vanishing.

The implication here is profound: if AI can write the code that builds AI, the feedback loop tightens dramatically. Clark notes that the vast majority of engineers he encounters now code entirely through AI systems, using them to write tests and verify results. "In other words, AI systems have gotten good enough to automate a major component of AI R&D, speeding up all the humans that work on it." This reframes the current landscape not as humans using tools, but as humans being rapidly displaced by tools that are becoming their own supervisors.

Critics might argue that benchmark saturation is a known artifact of overfitting, where models memorize test data rather than demonstrating generalizable intelligence. However, Clark anticipates this, acknowledging that benchmarks have idiosyncratic flaws but insisting that the aggregate trend across multiple datasets tells a consistent story.

"I now believe we are living in the time that AI research will be end-to-end automated. If that happens, we will cross a Rubicon into a nearly-impossible-to-forecast future."

The Time Horizon of Autonomy

Beyond just writing code, Clark argues that AI systems are gaining the stamina to work independently for increasingly long durations. He relies on data from METR, which tracks the complexity of tasks an AI can complete without human intervention. The progression is stark: from tasks taking 30 seconds in 2022 to roughly 12 hours by 2026. "Ajeya Cotra, a longtime AI forecaster who works at METR, thinks it isn't unreasonable to expect AI systems to do tasks that take ~100 hours by the end of 2026," Clark notes.

This extension of the "time horizon" is critical because it aligns with the actual workflow of a researcher. Much of AI R&D involves cleaning data, running experiments, and sanity-checking results—tasks that previously required hours of human attention. Now, these fall squarely within the autonomous window of modern systems. Clark suggests that as these systems get better at working independently, "the complexity and importance of the work being delegated" rises in tandem.

From Replication to Innovation

The most compelling evidence Clark marshals is the AI's ability to not just write code, but to conduct scientific research itself. He highlights CORE-Bench, which tests an agent's ability to reproduce results from a research paper. The jump from a 21.5% success rate in late 2024 to 95.5% by late 2025 indicates that AI can now reliably replicate the foundational work of science. Even more telling is the progress on PostTrainBench, where AI systems are fine-tuning smaller models to improve performance. "As of March 2026, AI systems are able to post-train models to get about half as much of the uplift as ones trained by humans," Clark observes.

This is where the argument moves from automation to acceleration. If AI can replicate human research and optimize models at 50% of human efficiency, it is only a matter of time before it surpasses the human baseline. Clark points to Anthropic's automated alignment research as a proof-of-concept where AI agents autonomously developed techniques that beat human-designed baselines on safety problems. "Nonetheless, it's proof that you can apply today's AI systems to contemporary cutting-edge research problems and we already see meaningful signs of life."

However, a counterargument worth considering is whether AI can truly generate the "paradigm-shifting ideas" that drive scientific revolutions, or if it is merely optimizing within existing frameworks. Clark addresses this by distinguishing between discovery and engineering. He asks, "Is AI research more like discovering general relativity or Lego?" His conclusion is that while AI may not yet invent radical new architectures like the transformer, it does not need to. "The technology may not need to [invent new ideas] for it to automate its own development," he argues, because the field advances largely through scaling and methodical experimentation.

"If scaling trends continue, we should prepare for models to get creative enough that they may be able to substitute for human researchers at having creative ideas for novel research paths, thus pushing forward the frontier themselves."

The Management Layer

Finally, Clark identifies a meta-skill emerging in these systems: management. Modern AI products are already deploying single agents that supervise multiple sub-agents, creating a hierarchy of software workers. This "AI for AI" management structure allows for parallel processing of complex tasks, effectively creating a self-driving research lab. The transition from a tool that helps a human researcher to a system that manages a team of researchers is the final piece of the puzzle Clark assembles.

Bottom Line

Jack Clark's argument is a sobering, data-rich case that the era of human-led AI discovery is nearing its end, with a 60%+ probability of fully automated R&D by 2028. The strongest part of his analysis is the convergence of evidence across coding, task duration, and scientific replication benchmarks, which collectively suggest a tipping point is imminent. The biggest vulnerability remains the uncertainty of whether AI can truly innovate beyond the patterns it has learned, but even if it cannot, the sheer speed of its optimization capabilities may render that distinction moot. Leaders must watch not just for new models, but for the moment when the next generation of models is designed entirely by their predecessors.

Import AI 455: Automating AI research

by Jack Clark · Import AI · Read full article

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe.

AI systems are about to start building themselves. What does that mean?I’m writing this post because when I look at all the publicly available information I reluctantly come to the view that there’s a likely chance (60%+) that no-human-involved AI R&D - an AI system powerful enough that it could plausibly autonomously build its own successor - happens by the end of 2028. This is a big deal. I don’t know how to wrap my head around it. It’s a reluctant view because the implications are so large that I feel dwarfed by them, and I’m not sure society is ready for the kinds of changes implied by achieving automated AI R&D. I now believe we are living in the time that AI research will be end-to-end automated. If that happens, we will cross a Rubicon into a nearly-impossible-to-forecast future. More on this later.The purpose of this essay is to enumerate why I think the takeoff towards fully automated AI R&D is happening. I’ll discuss some of the consequences of this, but mostly I expect to spend the majority of this essay discussing the evidence for this belief, and will spend most of 2026 working through the implications.In terms of timing, I don’t expect this to happen in 2026. But I think we could see an example of a “model end-to-end trains it successor” within a year or two - certainly a proof-of-concept at the non-frontier model stage, though frontier models may be harder (they’re a lot more expensive and are the product of a lot of humans working extremely hard). My reasoning for this stems primarily from public information: papers on arXiv, bioRxiv, and NBER, as well as observing the products being deployed into the world by the frontier companies. From this data I arrive at the conclusion that all the pieces are in place for automating the production of today’s AI systems - the engineering components of AI development. And if scaling trends continue, we should prepare for models to get creative enough that they may be able to substitute for human researchers at having creative ideas for novel research paths, thus pushing forward the frontier themselves, as well as refining what is already known.Upfront caveatFor much of this piece I’m going ...

Import AI 455: Automating AI research

The Coding Singularity

The Time Horizon of Autonomy

From Replication to Innovation

The Management Layer

Bottom Line

Deep Dives

Sources

Import AI 455: Automating AI research