A guide to which AI to use in the agentic era

Ethan Mollick doesn't just update a guide; he declares a fundamental shift in how artificial intelligence functions, moving from a conversational parlor trick to an autonomous worker. The most striking claim here isn't about which model is "smarter," but that the software wrapping the model—the "harness"—now matters more than the brain itself. For busy professionals, this reframes the entire value proposition of AI: you aren't paying for a chatbot anymore; you are paying for a digital employee that can execute complex, multi-step tasks without constant supervision.

The Three-Layer Architecture

Mollick argues that the old way of thinking—picking a model and chatting with it—is obsolete. He writes, "Until a few months ago, for the vast majority of people, 'using AI' meant talking to a chatbot in a back-and-forth conversation. But over the past few months, it has become practical to use AI as an agent: you can assign them to a task and they do them, using tools as appropriate." This distinction is critical. The author breaks the ecosystem down into three distinct layers: the Model (the intelligence), the App (the interface), and the Harness (the ability to use tools and take actions).

A guide to which AI to use in the agentic era

The analogy Mollick uses is particularly effective for a business audience. He explains that a harness is "what let the power of AI models do real work, like a horse harness takes the raw power of the horse and lets it pull a cart or plow." Without this harness, even the most advanced model is just a brilliant mind with no hands. This explains why the same underlying intelligence can produce wildly different results depending on the environment. As Mollick notes, "The exact same model, Claude Opus 4.6, asked the exact same question... in three different apps and harnesses. With no harness the information is out of date... using Claude Cowork, I get a sophisticated analysis and well-formatted head-to-head comparisons."

The question 'which AI should I use?' has gotten harder to answer, because the answer now depends on what you're trying to do with it.

This shift mirrors the rapid evolution seen in the early days of large language models, where the gap between a basic chatbot and a coding assistant was once a chasm, but now the "harness" is the primary differentiator. Critics might argue that this complexity creates a barrier to entry for non-technical users, but Mollick's analysis suggests that the alternative—using a model without a harness—is simply a recipe for failure in professional settings.

The Model Trap and the Cost of Capability

Mollick is blunt about the limitations of free tiers, a point often glossed over in marketing materials. He writes, "I wish I could tell you the free models currently available are as good as the paid models, but they are not. The free models are all optimized for chat, rather than accuracy, so they are very fast and often more fun to talk to, but much less accurate and capable." This is a crucial distinction for the executive reader. The "fun" of a chatbot is a feature, not a benefit, when you are trying to analyze a spreadsheet or draft a legal contract.

The article details how the major providers—OpenAI, Anthropic, and Google—have obscured the specific capabilities of their models behind confusing "auto" modes. Mollick points out that for OpenAI, "The issue is that GPT-5.2 is not one model, it is many... When you select GPT-5.2, what you are really getting is 'auto' mode, where the AI decides which model to use, often a less powerful one." He advises that for complex work, users must manually select the most advanced reasoning tiers, such as "GPT-5.2 Thinking Extended" or "GPT-5.2 Pro." This is a significant operational shift: the user must now act as an engineer, configuring the intelligence they deploy, rather than just a consumer asking a question.

The Harness Gap: Where the Real Work Happens

The most compelling part of Mollick's coverage is his assessment of the "harness" capabilities across the three major providers. While the underlying models are "remarkably close in overall capability," the tools they can access are not. Mollick observes, "OpenAI and Anthropic have clear leads over Google. Both Claude.ai and ChatGPT have the ability to write and execute code, give you files, do extensive research, and a lot more. Google's Gemini website is much less capable (even though its AI model is just as good)."

He illustrates this with a stark comparison: asking the same research question yields working spreadsheets and citations from ChatGPT and Claude, while Gemini fails to produce the documents or provide sources. This isn't a failure of the model's knowledge, but of the application's ability to let the model act on that knowledge. The author highlights that the "harness" is what transforms a text generator into a worker. "Claude Code has an even more extensive harness: it gives Claude 4.6 Opus a virtual computer, a web browser, a code terminal, and the ability to string these together to actually do stuff like researching, building, and testing your new website from scratch."

Mollick provides a fascinating case study of this power in action. He describes asking Claude Code to create a physical book project based on the original GPT-1 weights. "Over the course of an hour or so... it made 80 beautifully laid out volumes containing all of GPT-1, along with a guide to the math. It also came up with, and executed, covers for each volume... and launched it for me." This example underscores the shift from "chatting with AI" to "delegating projects to AI." The author notes that while these coding harnesses are powerful, they are also risky for amateurs, yet the emergence of tools like "Claude Cowork" signals a move toward safe, desktop-based agents for non-technical knowledge work.

AI that doesn't just talk to you about your work, but does your work.

This emerging category of "coworker" software, which runs locally and can manage files and browsers securely, represents the next frontier. Mollick notes that "Neither OpenAI or Google have a direct equivalent, at least this week," suggesting a temporary but significant competitive advantage for Anthropic in the agentic space. However, a counterargument worth considering is the potential for these autonomous agents to hallucinate or make costly errors when given full control over financial or operational systems, a risk that "default-deny networking" and virtual machines may mitigate but not eliminate.

Bottom Line

Ethan Mollick's guide succeeds by stripping away the hype of model benchmarks to focus on the practical reality of deployment: the tool matters more than the brain. The strongest part of the argument is the demonstration that without a sophisticated harness, even the smartest model is useless for complex, multi-step tasks. The biggest vulnerability for readers is the steep learning curve required to manually configure these advanced models and harnesses, a friction point that could slow adoption among less technical professionals. Watch for how quickly the "harness gap" closes, as Google and others scramble to match the autonomous capabilities of their rivals.

A guide to which AI to use in the agentic era

by Ethan Mollick · One Useful Thing · Read full article

I have written eight of these guides since ChatGPT came out, but this version represents a very large break with the past, because what it means to “use AI” has changed dramatically. Until a few months ago, for the vast majority of people, “using AI” meant talking to a chatbot in a back-and-forth conversation. But over the past few months, it has become practical to use AI as an agent: you can assign them to a task and they do them, using tools as appropriate. Because of this change, you have to consider three things when deciding what AI to use: Models, Apps, and Harnesses.

Models are the underlying AI brains, and the big three are GPT-5.2/5.3, Claude Opus 4.6, and Gemini 3 Pro (the companies are releasing new models much more rapidly than the past, so version numbers may change in the coming weeks). These are what determine how smart the system is, how well it reasons, how good it is at writing or coding or analyzing a spreadsheet, and how well it can see images or create them. Models are what the benchmarks measure and what the AI companies race to improve. When people say “Claude is better at writing” or “ChatGPT is better at math,” they’re talking about models.

Apps are the products you actually use to talk to a model, and which let models do real work for you. The most common app is the website for each of these models: chatgpt.com, claude.ai, gemini.google.com (or else their equivalent application on your phone). Increasingly, there are other apps made by each of these AI companies as well, including coding tools like OpenAI Codex or Claude Code, and desktop tools like Claude Cowork.

Harnesses are what let the power of AI models do real work, like a horse harness takes the raw power of the horse and lets it pull a cart or plow. A harness is a system that lets the AI use tools, take actions, and complete multi-step tasks on its own. Apps come with a harness. Claude on the website has a harness that lets Claude 4.6 Opus do web searches and write code but also has instructions about how to approach various problems like creating spreadsheets or doing graphic design work. Claude Code has an even more extensive harness: it gives Claude 4.6 Opus a virtual computer, a web browser, a code terminal, and ...

A guide to which AI to use in the agentic era

The Three-Layer Architecture

The Model Trap and the Cost of Capability

The Harness Gap: Where the Real Work Happens

Bottom Line

Deep Dives

Sources

A guide to which AI to use in the agentic era