The Pitch", "A major AI lab CEO predicted that by 2026, essentially all code written at his company would come from AI. By early 2026, that prediction looks eerily accurate. Now Anthropic has released a new tool called Claude Co-work—created using AI to build AI tools—which suggests knowledge work beyond coding faces the same trajectory. But here's what's surprising: despite viral posts claiming this equals AGI, the real story is more nuanced. A recent study shows human-AI collaboration actually does increase productivity—but these models still make embarrassing mistakes, like inventing false league positions for football clubs and failing at basic logical reasoning tasks.", "## The Tool That's Getting Everyone Talking", "Anthropic's Claude Co-work has generated tens of millions of views in just days. The tool was built using their flagship model, Claude Opus 4.5, essentially writing its own code—a milestone that seems to validate predictions about automating white collar work within the year.", "But the viral coverage creates two extreme reactions worth pushing back against. One says these tools are useless because they hallucinate constantly. The other claims they're already AGI and your career is doomed if you don't adopt them immediately. Both miss the actual value.", "## Where Productivity Actually Stands", "The real question isn't whether AI will replace everything—it's whether human-AI collaboration actually produces better results than humans alone. An October 2025 OpenAI study provides an answer: we've already passed a tipping point. Using blind human grading across dozens of white collar industries, the research shows that having models attempt tasks repeatedly while humans review and edit produces significantly more value than humans doing the work solo.", "The same pattern holds in coding. The lead developer for Claude Code acknowledged that creating Co-work wasn't zero intervention—humans planned, designed, and iterated with the AI. But here's the key finding: it's faster to have AI draft, fail, redraft, and then have a human step in than to do the work from scratch.", "One concrete example shows this clearly. A test asking for a comparison chart of a football club's league positions across five seasons produced a visually impressive PowerPoint in minutes. But checking the data revealed errors—two dates had incorrect league standings. The presentation looked professional but contained factual mistakes.", "The takeaway isn't to dismiss these tools or panic about them. It's that even imperfect AI-generated work can be edited quickly enough to beat doing the task yourself—and that's where productivity gains are real.", "## What Job Data Actually Shows", "Despite headlines about mass job layoffs, the data doesn't support an apocalypse narrative. An January 2026 report from Oxford Economics found no significant rise in unemployment across the US or other countries. New graduates do face slightly higher unemployment, but that falls within normal historical ranges.", "The more honest picture shows companies using AI announcements to frame layoff messages for investors—connecting job cuts to AI rather than weak demand or over-hiring creates a more positive narrative. Some sectors like customer service have incentives to adopt new technology, but the broader labor market impact remains limited.", "## Why Models Are Brilliant and Broken", "The inconsistency isn't mysterious—it reveals something fundamental about how these models work. Research from papers in early 2026 identifies three levels of understanding in large language models.", "First: simple conceptual understanding—registering connections between different manifestations of an entity. Second: contingent understanding—knowing things are true only under specific circumstances. Third: principled understanding—deriving underlying rules that unify diverse facts.", "These models don't aspire to perfect understanding. They learn whatever connections work, whether deep or brittle. They can navigate complex codebases and find tiny bugs but also deleted 11 gigabytes of files from a user's desktop randomly.", "The core issue is that AI lacks full intuitive definition of what understanding means. Models can grasp addition algorithms and produce poems but still fail basic logical tasks like deducing that if Mary Stone's husband is Tom Smith, then Tom Smith's wife is Mary Stone—even though GPT-5.2 struggles with counting letters in the word "orange."", "The inconsistency isn't a flaw to be fixed—it's a feature of how these systems work. They can reach high levels of understanding for some tasks while remaining brittle on others.", "> "We don't have a fully intuitive definition of what understanding means, and that's why we struggle to ascribe it to machines."", "## Bottom Line", "The strongest argument in this piece is that the productivity tipping point has genuinely arrived—human-AI collaboration outperforms humans alone. The biggest vulnerability is that current AI impact remains gated behind expensive subscriptions and technical barriers, limiting who actually benefits. For readers, the takeaway isn't panic or dismissal: these tools are real productivity boosters, but they're not magic, and they make mistakes that require human verification.