← Back to Library

Anthropic: Our AI just created a tool that can ‘automate all white collar work’, me:

The Pitch", "A major AI lab CEO predicted that by 2026, essentially all code written at his company would come from AI. By early 2026, that prediction looks eerily accurate. Now Anthropic has released a new tool called Claude Co-work—created using AI to build AI tools—which suggests knowledge work beyond coding faces the same trajectory. But here's what's surprising: despite viral posts claiming this equals AGI, the real story is more nuanced. A recent study shows human-AI collaboration actually does increase productivity—but these models still make embarrassing mistakes, like inventing false league positions for football clubs and failing at basic logical reasoning tasks.", "## The Tool That's Getting Everyone Talking", "Anthropic's Claude Co-work has generated tens of millions of views in just days. The tool was built using their flagship model, Claude Opus 4.5, essentially writing its own code—a milestone that seems to validate predictions about automating white collar work within the year.", "But the viral coverage creates two extreme reactions worth pushing back against. One says these tools are useless because they hallucinate constantly. The other claims they're already AGI and your career is doomed if you don't adopt them immediately. Both miss the actual value.", "## Where Productivity Actually Stands", "The real question isn't whether AI will replace everything—it's whether human-AI collaboration actually produces better results than humans alone. An October 2025 OpenAI study provides an answer: we've already passed a tipping point. Using blind human grading across dozens of white collar industries, the research shows that having models attempt tasks repeatedly while humans review and edit produces significantly more value than humans doing the work solo.", "The same pattern holds in coding. The lead developer for Claude Code acknowledged that creating Co-work wasn't zero intervention—humans planned, designed, and iterated with the AI. But here's the key finding: it's faster to have AI draft, fail, redraft, and then have a human step in than to do the work from scratch.", "One concrete example shows this clearly. A test asking for a comparison chart of a football club's league positions across five seasons produced a visually impressive PowerPoint in minutes. But checking the data revealed errors—two dates had incorrect league standings. The presentation looked professional but contained factual mistakes.", "The takeaway isn't to dismiss these tools or panic about them. It's that even imperfect AI-generated work can be edited quickly enough to beat doing the task yourself—and that's where productivity gains are real.", "## What Job Data Actually Shows", "Despite headlines about mass job layoffs, the data doesn't support an apocalypse narrative. An January 2026 report from Oxford Economics found no significant rise in unemployment across the US or other countries. New graduates do face slightly higher unemployment, but that falls within normal historical ranges.", "The more honest picture shows companies using AI announcements to frame layoff messages for investors—connecting job cuts to AI rather than weak demand or over-hiring creates a more positive narrative. Some sectors like customer service have incentives to adopt new technology, but the broader labor market impact remains limited.", "## Why Models Are Brilliant and Broken", "The inconsistency isn't mysterious—it reveals something fundamental about how these models work. Research from papers in early 2026 identifies three levels of understanding in large language models.", "First: simple conceptual understanding—registering connections between different manifestations of an entity. Second: contingent understanding—knowing things are true only under specific circumstances. Third: principled understanding—deriving underlying rules that unify diverse facts.", "These models don't aspire to perfect understanding. They learn whatever connections work, whether deep or brittle. They can navigate complex codebases and find tiny bugs but also deleted 11 gigabytes of files from a user's desktop randomly.", "The core issue is that AI lacks full intuitive definition of what understanding means. Models can grasp addition algorithms and produce poems but still fail basic logical tasks like deducing that if Mary Stone's husband is Tom Smith, then Tom Smith's wife is Mary Stone—even though GPT-5.2 struggles with counting letters in the word "orange."", "The inconsistency isn't a flaw to be fixed—it's a feature of how these systems work. They can reach high levels of understanding for some tasks while remaining brittle on others.", "> "We don't have a fully intuitive definition of what understanding means, and that's why we struggle to ascribe it to machines."", "## Bottom Line", "The strongest argument in this piece is that the productivity tipping point has genuinely arrived—human-AI collaboration outperforms humans alone. The biggest vulnerability is that current AI impact remains gated behind expensive subscriptions and technical barriers, limiting who actually benefits. For readers, the takeaway isn't panic or dismissal: these tools are real productivity boosters, but they're not magic, and they make mistakes that require human verification.

Anthropic: Our AI just created a tool that can ‘automate all white collar work’, me:

Deep Dives

Explore these related deep dives:

Sources

Anthropic: Our AI just created a tool that can ‘automate all white collar work’, me:

by AI Explained · AI Explained · Watch video

the CEO of one of the major AI labs predicted last year that by around now 100% of the code written by that company would be produced by one of their AI models. Next up within 2026 would be all other knowledge work and a new tool released by Anthropic in the last couple of days seems to back that up. It's called Claude Co-work. Not only has it gone omega viral at 42 million views for its ability to automate non-coding tasks, the tool itself was produced within Clawude code powered by their latest frontier model, Claude Opus 4.5, thereby seeming to justify the prediction that essentially all of the code would by now be written by AI.

So wait, if they got that right, does that mean that Anthropic and those like Schultto Douglas are correct when they say that in 2026, this year the same will be true of automating all white collar work. >> The most striking thing about next year is that the other forms of knowledge work are going to experience what software engineers are feeling right now where they went from typing, most of their lines of code at the beginning of the year to typing barely any of them at the end of the year. I think of this as the Claude code experience, but for all forms of knowledge work. I also think that probably continual learning gets solved in a satisfying way.

>> Well, I've been using Claude code for quite a while and yes, have been playing about with the new Claude co-work. And for me, those predictions are just not true. But so many of us might then throw the baby out with the bath water and miss out on some pretty crazy productivity gains. So, I'm going to show why we shouldn't underestimate the gains to be had either.

Then, for those who want to go a bit deeper, I'm going to end with the why. Why can models produce genius like seeing tiny bugs in large code bases and writing for me powerful poems but also still fail at such basic tasks? No, I don't mean how many A's in the word orange. Although surprisingly GPT 5.2 still can't get that right.

No, why are they still sometimes so brittle memorizing that Tom Smith's wife is Mary Stone but not deducing that Mary Stone's husband is ...