← Back to Library

How OpenAI, Gemini, and Claude use agents to power deep research

Alex Xu doesn't just explain how AI research works; he reveals that the most powerful systems are no longer single brains, but coordinated swarms. While most coverage fixates on the size of a model's parameter count, Xu shifts the focus to the architecture of collaboration, arguing that the future of intelligence lies in orchestration, not just raw processing power.

The Architecture of Collaboration

Xu begins by dismantling the myth of the solitary genius model. He writes, "Deep Research has become a standard capability across modern LLM platforms," but immediately clarifies that this is not a magic trick performed by one algorithm. Instead, he describes a "coordinated system that explores a wide landscape of information over 15 to 30 minutes." This distinction is vital for busy professionals: the value isn't in the speed of a single answer, but in the depth of a multi-step process that mimics human diligence.

How OpenAI, Gemini, and Claude use agents to power deep research

The core of Xu's argument rests on the concept of the "orchestrator." He explains that a lead agent "takes responsibility for the overall research strategy," breaking a vague user request into a precise plan before delegating tasks. This mirrors the evolution seen in multi-agent system research, where the complexity shifts from training a single model to managing the handoffs between specialized services. Xu notes that the orchestrator "creates a plan for how to answer the question," which is then "broken into smaller pieces and delegated to multiple sub-agents." This is a significant departure from earlier chatbot interactions, where the model often hallucinated a path forward based on incomplete context. By forcing a planning phase, the system reduces error rates before execution even begins.

"The quality of the final report is directly tied to the quality of this plan. If the plan is incomplete or misinterprets the user's intent, the resulting research will miss key information or go in the wrong direction."

This framing is effective because it places the burden of success on the system's logic, not just its knowledge base. However, a counterargument worth considering is whether this added complexity introduces new failure points. If the orchestrator misinterprets the initial query, the entire swarm of sub-agents may efficiently pursue the wrong goal, wasting computational resources and user time.

Specialization and Parallel Execution

The most compelling section of Xu's analysis details how these systems leverage parallelism. He describes a "pool of agents tuned for particular functions," such as a web search agent specialized in query formation and a data analysis agent with access to a code interpreter. This specialization is reminiscent of the shift in reinforcement learning from human feedback (RLHF), where models were trained to align with human preferences; here, the architecture is trained to align with specific functional roles.

Xu highlights the efficiency of this approach: "One sub-agent might be researching market trends, another might be gathering historical financial data, and a third might be investigating competitor strategies, all in parallel." This is not merely a speed boost; it is a qualitative change in the scope of research. A single agent might get stuck in a loop of re-reading the same source, but a swarm can cross-reference multiple angles simultaneously. He notes that the orchestrator "keeps track of dependencies and triggers sub-agents when their inputs are ready," ensuring that the workflow remains logical despite the parallel execution.

The tools these agents use are equally specific. Xu writes that agents "issue tool calls that the system executes on their behalf," utilizing search tools, browser tools to fetch full page content, and code interpreters for numerical tasks. This separation of concerns allows the language model to focus on reasoning while the tools handle the heavy lifting of data retrieval and calculation. Critics might argue that relying on external tools introduces latency and potential points of failure, such as broken links or API rate limits, which could stall the entire research process.

"By using specialized agents, the system can apply the best tool and approach to each part of the plan, which improves both the accuracy and efficiency of the overall research."

The Synthesis Challenge

Perhaps the most critical insight Xu offers is on the final stage: synthesis. It is one thing to gather a hundred snippets of data; it is another to weave them into a coherent narrative. Xu describes a "synthesizer agent" that "organizes the information into sections, resolves overlaps, and builds a coherent narrative." He emphasizes that this stage is where the system must "ensure that every piece of information used later in the synthesis stage is traceable back to its source."

This focus on citations is a direct response to the historical problem of AI hallucination. Xu notes that the citations agent "reads through the synthesized report and makes sure that each statement is supported by the correct sources." This mechanism transforms the output from a creative writing exercise into a verifiable research document. He points out that different providers handle this differently, with some using a dedicated citations agent and others embedding the function within the orchestrator. The result is a system where the "final report is thoroughly backed by the underlying material."

However, the reliance on automated citation raises questions about the nuance of source evaluation. While the system can check if a URL exists and extract text, it may struggle to assess the credibility of a source in the way a human researcher would, potentially giving equal weight to a peer-reviewed paper and a well-written blog post if the content appears similar. Xu acknowledges that agents must "constantly evaluate whether the information is relevant," but the depth of this evaluation remains a black box.

Bottom Line

Xu's piece succeeds by demystifying the "magic" of AI research, replacing it with a clear, logical architecture of planning, delegation, and synthesis. The strongest part of the argument is the emphasis on parallel execution and specialized tools, which fundamentally changes what is possible in automated research. The biggest vulnerability lies in the assumption that the orchestrator can always perfectly interpret human intent and that automated citation guarantees truth. As these systems become standard, the real challenge will not be generating the report, but trusting the logic that built it.

Deep Dives

Explore these related deep dives:

  • Reinforcement learning from human feedback

    The article mentions OpenAI's deep research agent uses reinforcement learning to train models for planning multi-step research tasks. RLHF is the specific technique that enables these models to improve decision-making through reward signals, which is fundamental to understanding how these agents learn to coordinate tool calls effectively.

  • Multi-agent system

    The entire article centers on multi-agent architectures where orchestrator agents coordinate sub-agents for research tasks. Understanding the formal computer science concept of multi-agent systems—including coordination protocols, distributed problem-solving, and agent communication—provides essential theoretical grounding for the practical implementations described.

  • Information retrieval

    Perplexity's iterative information retrieval loop and the web search agents across all platforms rely on IR principles. This foundational field covers how systems find relevant documents from large collections, relevance ranking, and query refinement—the technical substrate beneath all the deep research capabilities discussed.

Sources

How OpenAI, Gemini, and Claude use agents to power deep research

Power your company’s IT with AI (Sponsored).

What if you could spend most of your IT resources on innovation, not maintenance?

The latest report from the IBM Institute for Business Value explores how businesses are using intelligent automation to get more out of their technology, drive growth & cost the cost of complexity.

Disclaimer: The details in this post have been derived from the details shared online by OpenAI, Gemini, xAI, Perplexity, Microsoft, Qwen, and Anthropic Engineering Teams. All credit for the technical details goes to OpenAI, Gemini, xAI, Perplexity, Microsoft, Qwen, and Anthropic Engineering Teams. The links to the original articles and sources are present in the references section at the end of the post. We’ve attempted to analyze the details and provide our input about them. If you find any inaccuracies or omissions, please leave a comment, and we will do our best to fix them.

Deep Research has become a standard capability across modern LLM platforms.

ChatGPT, Gemini, and Claude all support tasks that run for long periods of time and gather information from large portions of the public web.

A typical deep research request may involve dozens of searches, several rounds of filtering, and the careful assembly of a final, well-structured report. For example, a query like “list 100 companies working on AI agents in 2025” does not rely on a single search result. It activates a coordinated system that explores a wide landscape of information over 15 to 30 minutes before presenting a final answer.

This article explains how these systems work behind the scenes.

We will walk through the architecture that enables Deep Research, how different LLMs implement it, how agents coordinate with one another, and how the final report is synthesized and validated before being delivered to the user.

High-Level Architecture.

Deep Research systems are built from AI agents that cooperate with each other. In this context, an AI agent is a service driven by an LLM that can accept goals, design workflows to achieve those goals, and interact with its environment through tools such as web search or code execution.

See the diagram below to understand the concept of an AI Agent:

At a high level, the architecture begins with the user request. The user’s query is sent into a multi-agent research system. Inside this system, there is usually an orchestrator or lead agent that takes responsibility for the overall research strategy.

The ...