← Back to Library

Google's New AI Is Smarter Than Everyone's But It Costs HALF as Much. Here's Why They Don't Care.

{"author": "Nate B Jones", "pieces": [{"text": "## The Real Story Nobody's Talking About

Here's what's happening: Google built the smartest AI model on the planet. It's called Gemini 3.1 Pro. It leads on 13 of 16 benchmarks. And it costs roughly a seventh of what Anthropic's Opus 4.6 charges.

But that's not the story.

The story is that Google doesn't care if you use Claude or ChatGPT for your daily work. They don't need you. And that indifference represents the most important strategic signal in AI right now.

Why This Matters

Most people covering Gemini 3.1 Pro are focused on benchmark scores. The author argues that's exactly what's been missed.

The real question is this: why does the richest company in tech—generating over $100 billion in annual free cash flow—build the most powerful reasoning engine on the market, price it at the floor, and be perfectly comfortable if you keep using Claude or ChatGPT?

The answer reshapes how you should think about every model release from here on out. It changes how you evaluate your own skills. And it explains why most of the conversation about which AI you should use is really asking the wrong question.

The Strategy Behind the Silence

Demis Hassabis has been saying the same sentence for 15 years: step one, solve intelligence; step two, use it to solve everything else.

He said it when DeepMind was a London startup nobody had heard of. He said it after AlphaGo beat a Go Grandmaster. He said it at Davos last month. He said it on the Fortune podcast where he predicted artificial general intelligence would come within five years.

This is not how anyone else in the AI industry talks. Sam Altman talks about products, partnerships, distribution—the race to a billion users. When OpenAI put ads in ChatGPT, they did so because they needed to monetize that user base.

But Google doesn't need to monetize Gemini. They have the world's largest search engine and its profit streams funding them. They generate over $100 billion in annual free cash flow from search, YouTube, and cloud. They're spending $93 billion on capital expenditure this year—and most of it is AI.

Google can afford to let Gemini be a research vehicle because their economic engine has nothing to do with whether you prefer Claude or ChatGPT for your daily workflow.

The Impregnable Fortress

It's not just the profit streams. Google has deliberately built over the last decade a vertical stack in AI that nobody else has.

They design their own silicon. The Ironwood TPU, 7th generation announced earlier this year, delivers 10 times the compute power of the last generation at roughly half the energy cost per operation. It can link up 9,212 chips in a single pod.

Google trains their own models on that silicon. They deploy those models through their own cloud infrastructure—Google Cloud, which nine out of ten AI research labs use in some capacity. They distribute them to 650 million monthly active Gemini users, plus billions more through search, Android, YouTube, and Chrome.

And they fund the fundamental research through DeepMind, which won a Nobel Prize in chemistry 18 months ago for AlphaFold—a system that predicted the structure of virtually every known protein, a problem biologists have been working on for 50 years.

This vertical integration from transistor design to protein folding is the architecture of a company that believes intelligence is a problem in computer science, that the problem is solvable, and that solving it requires controlling the entire stack from physics up to software.

What Gemini 3.1 Pro Is—and Isn't

Gemini 3.1 Pro is not a coding agent, though of course it can write code very well. It's also not an agent manager, although it can manage agents. It's not trying to autonomously close issues across a fifty-person engineering org the way Opus 4.6 did at Rakuten.

What 3.1 Pro actually is is the strongest pure reasoner available at scale at a price point that makes it viable for any problem where reasoning depth matters more than tool orchestration.

At $2 per million input tokens and just 12 cents per million output tokens, it's roughly seven and a half times cheaper than Opus 4.6 on input and more than six times cheaper on output. For a workload processing a billion tokens a month, that is the difference between a $15,000 bill and a $2,000 bill.

The model also ships with configurable thinking levels—low, medium, high, and max—so you can dial reasoning depth and cost up or down upon request. Simple classification or summarization? Low thinking, fast and cheap. Novel scientific problem requiring multi-step deduction? Turn it up to max and let it work.

This is cost engineering for reasoning at a granularity nobody else really offers.

The Real Comparison

When you give these models tools—web search, code execution, database access, file systems—and measure their performance on complicated real-world tasks that require using those tools together to get work done, Opus 4.6 catches up and often pulls ahead.

On Humanity's Last Exam with search and code tools, Opus scores 53.1% versus Gemini's 51.4%. On GPAI, which measures expert-level office and financial tasks, Opus leads by 289 ELO points—a massive gap.

The pattern here is unambiguous: Gemini 3.1 Pro is the strongest naked reasoner. Opus 4.6 is the strongest equipped reasoner—the model that's best at combining intelligence with the ability to use tools, call APIs, read files, write code, and sustain that work over hours and days. GPT 5.3 is the strongest specialist coder.

If intelligence is the engine, tools are the drivetrain. Google built a better engine and Anthropic built a better car. OpenAI built better racing transmission for any individual task.

The question isn't which model is smartest. The question is whether the task got bottlenecked by raw thinking or whether the ability to act on that thinking across tools in time is the real bottleneck.

That question turns out to be way more interesting than any benchmark, and we are not talking about it enough.

Where Pure Reasoning Wins

A good example is Gemini's DeepThink, released February 12th—a specialized reasoning mode that sits above 3.1 Pro on the intelligence curve. DeepThink collaborated with human researchers to solve 18 previously unsolved problems across mathematics, physics, computer science, and economics.

These aren't incremental improvements or benchmark tricks. They're making original research contributions.

A conjecture in online submodular optimization had stood unproven since 2015. Gemini DeepThink engineered a precise three-item combinatorial counterexample and proved the conjecture false in a single run.

That wasn't even the most interesting result. On the max cut problem—a classic network optimization challenge—it tackled problems in physics where it caught a critical error in a cryptography paper. Human algorithm researchers would not typically reach into geometric functional analysis to solve a graph theory problem—they're really different domains of mathematics.

The model crossed disciplinary boundaries that human specialists very rarely cross because the model doesn't see disciplinary boundaries, and that is one of the strengths of an AI model.

Bottom Line

Google's strategy reveals something critical: they're not trying to win the model race through monetization. They're playing a fundamentally different game—building pure intelligence while their profit engine funds everything else.

The strongest part of this argument is the distinction between raw reasoning and tool orchestration as bottlenecks. It's a framing that most AI coverage misses entirely.

The biggest vulnerability is whether pure reasoning actually matters for most real-world business problems, which typically require tool use—coding agents, API calls, file management—that pure reasoners can't handle.

What readers should watch: whether Google's bet on pure reasoning pays off in actual scientific breakthroughs. DeepThink's 18 solved problems suggest it might. And if those results hold, the entire conversation about which AI model to use becomes secondary to the question of what you actually need your AI to do."}]}

Google just shipped the smartest AI model on the planet. It's Gemini 3.1 Pro. It costs a seventh of the competition and they don't even need you to use it. That's right.

They shipped a model that leads on 13 of 16 benchmarks. It costs roughly a seventh of what Opus 4.6 charges. And Google really doesn't care. That's not a weird flex on their part.

It might be the most important strategic signal in AI right now. And almost nobody is talking about it. The coverage of Gemini 3.1 Pro has been all about those benchmarks. And what's been missing is the question that is underneath.

Why does the richest company in tech, a company generating over a hundred billion in annual free cash flow, build the most powerful reasoning engine on the market, price it at the floor, and be perfectly comfortable if you keep using Claude or Chad GPT for your daily work? The answer reshapes how you should think about every model release from here on out. It changes how you evaluate your own skills and it explains why most of the conversation about which AI I should use is really asking the wrong question at this point. So a couple weeks ago I wrote about Opus 4.6 and the way they use 16 AI agents to build a C compiler.

That piece was about a new kind of labor. Agents coordinating in teams, managing engineering orgs, doing weeks of sustained autonomous work. This video is about something different. This video is about why the company with the deepest pockets and the widest distribution in the history of computing in the history of the planet is playing a fundamentally different game from anybody else.

And what that means for how you evaluate AI models, choose your tools, and understand which of your problems are about to get dramatically easier to solve versus which ones are not. So, we're going to talk about one benchmark out of those 16 because I don't usually talk about benchmarks. That number is 77.1% and it's on the ARC AGI2 benchmark. Why do we care?

It's not about pattern matching from training data. ARC AGI2 tests whether a model can solve logic problems it has never ever seen before. So it's not about retrieval from memorized examples, but about genuinely novel reasoning. Can the model look at a problem it's never ...