← Back to Library

Gemini 3 Pro: Breakdown

In the last 24 hours, Google released Gemini 3 Pro. And for me, it genuinely marks a new chapter in the race to true artificial intelligence. Not only because Google is now clearly ahead, but also because it will be pretty hard for other companies to match their rate of acceleration. I have tested Gemini 3 hundreds of times, including through early access, and it is indeed a significant leap, not just a nudge forwards.

on my own private independent benchmark, Simple Bench. It crushed [snorts] its rivals or I should say beat its own record to be clearly number one in this benchmark. I will show you a sample question in a moment, but you may think that's a fluke. Well, that would be a pretty hard line to maintain with the 20 other benchmarks in which it reaches record performance.

So, while Gemini 3 is not perfect, it will be a deafening wakeup call to companies like OpenAI and Antropic. I'm also going to touch on benchmarks where it didn't perform as well, as well as the fascinating new tool, Google Anti-gravity. Above all, I'm going to try and give you at least 11 details that you wouldn't get from just reading the headlines that are going viral about the new Gemini 3. Let's start with the benchmark with the scariest name, humanity's last exam.

And the reason the author of that benchmark, whom I've spoken to, called it that was because he solicited the hardest possible questions that he could derive using any expert out there. They paid for any question at the time, which is around a year ago, that the Frontier models couldn't get right. Now, the name of that benchmark has become somewhat ironic because even without doing a web search, just using its own knowledge, so no tools, Gemini 3 Pro gets 37.5%. a huge leap above GPT 5.1 and that's a theme that you'll see recurring throughout these benchmarks.

And sticking with knowledge for a second, what about scientific knowledge in STEM subjects? That's tested in the Google proof Q&A GPQA Diamond. Even the creator of this benchmark thought that model performance had plateaued, but no, Gemini 3 Pro sets a record 92% almost. That compares to GPC 5.1 getting 88.1%.

Now, I know what many of you are thinking. Oh, well that's only 4% improvement. Don't go too wild. But imagine ...

Watch on YouTube →

Watch the full video by AI Explained on YouTube.