GPT-5 has Arrived
Well, GBT 5 is here and it's in the free tier. I've tested it a bunch, read the system card in full, and even sat through that full live stream. Wow. But actually, I think it's pretty huge that free users of ChatBT will get access to GPT5.
In other words, approaching a billion people will experience a significantly more intelligent AI model, at least before they hit the limits. But if you watched the live stream and demo, you may have been underwhelmed. And I don't just mean the mathematically impossible bar graphs, and there were multiple of those. There were even hallucinations in the segment describing how the model hallucinates less.
For sure, it would be easy to make a video just taking the mick of those mistakes. But the thing is, GT5 is actually a pretty great model. So, here are my first impressions. First, my own logic benchmark or some people call it a trick question benchmark.
I can confirm that GT5 indeed does crush the public questions of SimpleBench. Whoever this was that came out with this viral thread of it getting 9 out of 10 on those public 10 questions from SimpleBench wasn't lying. Technically, in some of my early testing, it got questions right that no other model had gotten right. When I saw this, I was like, man, I'm gonna have to bring out V2 really early.
Everyone's going to get super hyped. This is crazy. However, if you're newer to AI, you might not know that the performance of language models is heavily dependent on the training data they're fed. And I suspect some of these 10 public questions have made it into the training data, at least indirectly.
Not deliberately, I think, but given that the models are trained on things like Reddit and other forums, it's definitely not impossible. Given how long I normally take to update the leaderboard, you guys might be quite shocked to hear that we're doing the runs tonight. And so far, it's not setting a new record. That surprised even me actually.
I was expecting honestly 70%. I'll be honest with you guys. So far in the three runs we've done it's getting around 57 58%. So at this point we can be clear it's not a new paradigm of AI and if you didn't believe models were AGI now this model won't convince ...
Watch the full video by AI Explained on YouTube.