← Back to Library

GPT 5.2: OpenAI Strikes Back

In the last 24 hours, OpenAI have released a new model and plenty of record-breaking results. GPT 5.2 might not be a Christmas miracle, however, as to get Frontier performance, it often needs to spend more tokens thinking, but just setting tokens aside for one moment, GPT 5.2 is in many benchmarks among the best language models out there. For me, this is a tiny bit like us all getting luxury Christmas presents, though, where we don't know which results were bought by the labs with the last of their intellectual or financial overdraft, and which results will be superseded early in the new year with something even shinier. Either way, it's a genuinely good model.

So, let me give you nine details about GPT 5.2 that you wouldn't get from just reading the headlines, so you can decide for yourself. Plus, I'm going to end with a sheep analogy, which I think is quite good. First, let's talk about the bold claim right at the top of the release page for GPT 5.2, which is that GPC 5.2 thinking sets a new state-of-the-art score on GDP vow and is the first model that performs at or above a human expert level. It beats or ties top industry professionals on 71% of comparisons on that benchmark according to expert judges and it's the best model yet for realworld professional use apparently.

I will say that both OpenAI and Samman were relatively specific about the claim they were making for this benchmark calling it an eval measuring wellsp specified knowledge work tasks across 44 occupations. Nevertheless, seeing models exceed expert level in realworld professional tasks may lead many to misinterpret this chart and this benchmark. I have tested Gypsy 5.2 heavily and covered this benchmark specifically in great detail in a previous video, but let me give you a 10-second recap. Yes, the questions for GDP Val were crafted by industry experts, but the jobs must be predominantly digital jobs.

Any that weren't were excluded. only a subset of the tasks within each of those occupations were selected and the quote well specified adjective they gave was intentional because the full context of each task is given to the models beforehand and even open AI say in the release notes that real tasks often involve tacet knowledge where basically you have to search out or intuitit or know the contextual information ...

Watch on YouTube →

Watch the full video by AI Explained on YouTube.