← Back to Library

AI improves at self-improving

Mind is a coding agent at heart. It takes code submitted by humans, along with evaluation metrics telling it what success looks like, and then iteratively improves that code.

The human provides the problem to solve, some code they have tried, and critically, evaluation metrics. The more metrics they can give, the better the performance. Then Gemini Flash — the smaller and quicker version of Google's language model — iterates on that code for plentiful ideas. Gemini Pro handles solid suggestions.

AI improves at self-improving

The system uses prompt sampling, drawing on previous prompts humans have tried that worked before and programming via a database of programs that were great in other situations. All with one goal: improving the code the human submitted against those evaluation metrics."}, {"heading": "State-of-the-Art Results", Alpha Evolve eventually comes back with code improvements that produce programs 75% of the time considered state-of-the-art at one of dozens of given tasks. Twenty percent of the time, these constructions are actually better than state-of-the-art.

The most famous achievement: it found a rank 48 tensor decomposition for 4x4 complex matrix multiplication — an unexpected improvement on the 50-year-old record for algorithms suitable for recursive application. Tensor decomposition means discovering a more fundamental recipe with fewer core steps to perform matrix multiplication. This allows the method to be used repeatedly or recursively to dramatically speed up calculations for very large matrices.

Google also applied Alpha Evolve to its data center optimization, called Borg. The system helped recover 0.7% of its worldwide compute resources — soon amounting to billions of dollars in savings."}, {"heading": "The Next Generation", The natural next step is distilling Alpha Evolve's augmented performance into the next generation of base language models. This can have intrinsic value and likely uplift the next version of Alpha Evolve.

The paper makes clear this creates a recursive loop: code that proves to be good becomes great data for training the next base model, which then gets better at coming up with improved programs.

Alpha Evolve also helped refine Google's Ironwood chips, its specialized processors. When given that as a problem, it led to a 1% reduction in Gemini's training time — another recursive loop where a better or more efficient Gemini leads to a better future Alpha Evolve."}, {"heading": "Future Improvements", Google admits several areas for improvement going forward.

First, solutions and their scores are kept in an evolutionary database. With Gemini models confirmed to have up to a 10 million token context window — though public versions only go up to 2 million — that evolutionary database could one day get incredibly large, giving future models a vast library to draw upon.

Second, Alpha Evolve is model agnostic. As hardware improves and training time reduces, knowledge can be distilled to help make a better Gemini 3, which will make a much better language model within Alpha Evolve.

The ablations in the paper showed every part of the coding agent was crucial. Using only a small base language model capped out performance at a lower point. Without that massive context window and full file evolution, performance also caps out at a much lower point.

Third, the code snippet Alpha Evolve improves doesn't have to be the final function generating the direct solution. It can be a search algorithm later used to find an optimal final function. So Alpha Evolve can essentially continue to improve how we search for optimal programs."}, {"heading": "The Limitations", Alpha Evolve is not yet confirmation of an imminent fast takeoff. The main limitation: it handles problems for which it is possible to devise and submit an automated evaluator.

While this is true of many problems in mathematical and computational sciences, there are domains such as natural sciences where only some experiments can be simulated or automated. It can help scientists evaluate new scientific experiments and they are working on making it a better co-scientist.

Even the famously bullish Anthropic CEO Dario Amodee has said intelligence will be initially heavily bottlenecked by other factors of production — test tubes can only be tested so fast."}, {"heading": "Still No Skynet", Alpha Evolve could not yet create Alpha Evolve. It could improve parts of itself as discussed, but couldn't create the entire system from scratch.

As Demis Hassabis, head of Google DeepMind, puts it: we have systems that are superhuman at the game Go, but yet could not invent Go. Humans are still in the driver's seat — at least for now."}]}

Deep Dives

Explore these related deep dives:

Sources

AI improves at self-improving

by AI Explained · AI Explained · Watch video

AI that can help improve AI is actually almost everywhere if where to look. Not least coding tools like the new codeex from open AAI which didn't just help me find a bug that claude within cursor missed but is helping AI researchers too. The coding agents might be doing the easier bits, but it's freeing up AI researchers time to well work on AI improvement. But rarely is the process of AI self-improvement so direct as it is in the alpha evolve agent from Google Deepmind.

It can generate better prompts for itself so that it can evolve better code for useful tasks. Tasks which lead to efficiencies in its own next version. This was published less than a 100 hours ago, but don't worry, it isn't Skynet. The real world does not yet allow for the speed of iteration that Alpha Evolve involves.

But I would say that this agent is the final proof for anyone left doubting it that LM are not a dead end and have barely even begun to make their mark. I'm going to draw on plenty of analogies and multiple interviews to give you guys at least a gut sense of what is going on with this recursive Ronin. this agent that has already led to realworld efficiencies in the Google data center fleet and mathematical breakthroughs decades in the making. First though, please just skip to the chase.

What on earth is this thing? Basically, the human comes along and has to provide the problem to solve, some code that they may have tried, and critically some evaluation metrics. Those details are kind of crucial if you don't want to get an overhyped sense of what Alpha Evolve can do. Anyway, the human provides all of that and the more metrics they can give, the better the performance.

Then essentially, the human can just vibe as Gemini 2, not Gemini 2.5, the far more impressive successor, but Gemini 2, iterates on that code. The system uses the Flash version of Gemini, the smaller and quicker one, for plentiful ideas. but the pro version, Gemini 2 Pro, for solid suggestions. Notice the prompt sampler, wherein the system draws on previous prompts that humans have tried that worked before and programs via the program database that were great in other situations.

All with a goal of improving the code that the human submitted against ...