Mind is a coding agent at heart. It takes code submitted by humans, along with evaluation metrics telling it what success looks like, and then iteratively improves that code.
The human provides the problem to solve, some code they have tried, and critically, evaluation metrics. The more metrics they can give, the better the performance. Then Gemini Flash — the smaller and quicker version of Google's language model — iterates on that code for plentiful ideas. Gemini Pro handles solid suggestions.
The system uses prompt sampling, drawing on previous prompts humans have tried that worked before and programming via a database of programs that were great in other situations. All with one goal: improving the code the human submitted against those evaluation metrics."}, {"heading": "State-of-the-Art Results", Alpha Evolve eventually comes back with code improvements that produce programs 75% of the time considered state-of-the-art at one of dozens of given tasks. Twenty percent of the time, these constructions are actually better than state-of-the-art.
The most famous achievement: it found a rank 48 tensor decomposition for 4x4 complex matrix multiplication — an unexpected improvement on the 50-year-old record for algorithms suitable for recursive application. Tensor decomposition means discovering a more fundamental recipe with fewer core steps to perform matrix multiplication. This allows the method to be used repeatedly or recursively to dramatically speed up calculations for very large matrices.
Google also applied Alpha Evolve to its data center optimization, called Borg. The system helped recover 0.7% of its worldwide compute resources — soon amounting to billions of dollars in savings."}, {"heading": "The Next Generation", The natural next step is distilling Alpha Evolve's augmented performance into the next generation of base language models. This can have intrinsic value and likely uplift the next version of Alpha Evolve.
The paper makes clear this creates a recursive loop: code that proves to be good becomes great data for training the next base model, which then gets better at coming up with improved programs.
Alpha Evolve also helped refine Google's Ironwood chips, its specialized processors. When given that as a problem, it led to a 1% reduction in Gemini's training time — another recursive loop where a better or more efficient Gemini leads to a better future Alpha Evolve."}, {"heading": "Future Improvements", Google admits several areas for improvement going forward.
First, solutions and their scores are kept in an evolutionary database. With Gemini models confirmed to have up to a 10 million token context window — though public versions only go up to 2 million — that evolutionary database could one day get incredibly large, giving future models a vast library to draw upon.
Second, Alpha Evolve is model agnostic. As hardware improves and training time reduces, knowledge can be distilled to help make a better Gemini 3, which will make a much better language model within Alpha Evolve.
The ablations in the paper showed every part of the coding agent was crucial. Using only a small base language model capped out performance at a lower point. Without that massive context window and full file evolution, performance also caps out at a much lower point.
Third, the code snippet Alpha Evolve improves doesn't have to be the final function generating the direct solution. It can be a search algorithm later used to find an optimal final function. So Alpha Evolve can essentially continue to improve how we search for optimal programs."}, {"heading": "The Limitations", Alpha Evolve is not yet confirmation of an imminent fast takeoff. The main limitation: it handles problems for which it is possible to devise and submit an automated evaluator.
While this is true of many problems in mathematical and computational sciences, there are domains such as natural sciences where only some experiments can be simulated or automated. It can help scientists evaluate new scientific experiments and they are working on making it a better co-scientist.
Even the famously bullish Anthropic CEO Dario Amodee has said intelligence will be initially heavily bottlenecked by other factors of production — test tubes can only be tested so fast."}, {"heading": "Still No Skynet", Alpha Evolve could not yet create Alpha Evolve. It could improve parts of itself as discussed, but couldn't create the entire system from scratch.
As Demis Hassabis, head of Google DeepMind, puts it: we have systems that are superhuman at the game Go, but yet could not invent Go. Humans are still in the driver's seat — at least for now."}]}