← Back to Library

OpenAI Tests if GPT-5 Can Automate Your Job - 4 Unexpected Findings

In the last 24 hours, OpenAI have released research on essentially whether current language models can automate your job. The big claim, albeit carefully worded, is that current best Frontier models are approaching industry experts in deliverable quality. But as you'll see from the title, there are plenty of unexpected findings in this research. Before I dive into that, there is one job we seem intent on automating, and that is one of being a UFC fighter.

You can laugh at the lack of performance now, but like me, you might be laughing somewhat nervously. Take a look at this Uni Tree G1 robot, which maybe hasn't mastered kung fu, but he's getting a bit closer. Quick predictions. Do you reckon billionaires will have robot humanoid bodyguards by 2035?

Let me know. back to the paper and they are only focusing on the most important sectors according to their contribution to GDP. What makes things more interesting is that the questions weren't designed by OpenAI. They were designed by industry professionals themselves with an average of 14 years of industry experience.

They had to meet all sorts of criteria just to design the questions. And here are the headline results which you may have seen go viral with Claude Opus 4.1 a model by Anthropic beating out OpenAI's models and coming quite close to par with industry experts. This I am obviously going to class as the first surprising finding. Not that Opus is the best model because Opus 4.1 if you haven't tried it is indeed an amazing model.

So no that's not the most surprising bit. It's that OpenAI published this result showing Opus beating its own models. I think that's great honest science by the way and I commend OpenAI for publishing this. Now you might be thinking no Phillip the most surprising bit is how close we're getting to parity with industry experts but I'll come back to that in just a moment.

Right now I want to cover this second you could say somewhat surprising result which is that the win rate when compared to humans depended quite heavily on the file type involved. If your workflow involves submitting or producing a PDF, PowerPoint, or Excel spreadsheet, you might well find that Opus 4.1 is a league ahead. All these figures, by the way, are on how often a model beats a human expert output ...

Watch on YouTube →

Watch the full video by AI Explained on YouTube.