Single Agent vs Multi-Agent: When Each Pattern Actually Wins

← Back to Blog

The agent architecture question comes up fast when you start building anything real. One agent that handles everything, or a team of specialized agents coordinated by an orchestrator? Both patterns have strong advocates. Both work. The difference is in what you're building and how much complexity you're willing to manage.

Here's what I've learned running both in production.


The single agent pattern

One agent. Loaded with whatever context and instructions the task requires. At any given moment it might be doing research, writing code, reviewing output, or sending a notification — wearing different hats, drawing on a shared context window.

The appeal is straightforward: everything the agent knows is in one place. The researcher and the builder are the same agent, so there's no handoff problem, no context loss when you pass a document from one agent to another, no coordination overhead. If the task requires reading something you found in step 1 to inform step 4, that's automatic — it's all in context.

Where single agent wins:

Where it struggles:


The multi-agent pattern

Multiple agents, each with a defined scope, coordinated by an orchestrator that delegates work and assembles results.

A research agent that can access the web. A build agent that can write and execute code. A review agent that can read output and flag issues. The orchestrator — which might be another Claude session, a queue runner, or application code — assigns work, waits for results, and decides what happens next.

Where multi-agent wins:

Where it struggles:


The hidden costs of multi-agent

The seductive thing about multi-agent architectures is that they sound efficient. Parallel execution, specialized tools, clean separation of concerns. All of that is real. But there's a cost side that doesn't get discussed as much.

Orchestration overhead. The orchestrator needs to understand the state of all running agents, parse their outputs, handle errors, and make decisions about what to delegate next. If your orchestrator is itself a Claude session, you're spending tokens on coordination that a single agent would spend on actual work.

Format brittleness. Single agents can handle ambiguity — they understand their own output. When agent A passes output to agent B, agent B has to parse it. Structured output formats help, but every handoff is a place where the format can drift or the contract can break.

Debugging surface. A bug in a single-agent system has one context window to inspect. A bug in a multi-agent system might live in any of several agent contexts, or in the handoffs between them. The observability investment required scales with the number of agents.


The decision framework

A few questions that actually help:

Are the subtasks genuinely parallel, or just decomposable? If you can do research and writing in parallel, multi-agent saves time. If the writing depends on the research, you're not gaining true parallelism — you're just adding coordination.

Does tool isolation matter for your security model? If you need to guarantee that your research agent can't write to the file system, a separate agent with different permissions is the right call. If tool overlap is acceptable, the isolation benefit disappears.

How much debugging time are you willing to invest? Multi-agent systems need investment in observability, structured output contracts, and error handling. If you don't make that investment, you'll pay for it when something breaks.

What's the task duration? Long-running background tasks often benefit from multi-agent decomposition because you can run components on demand rather than keeping a single long context alive. Short interactive tasks rarely do.


What actually works

The pattern I've found most useful isn't at either extreme. One primary agent that handles the majority of a task, with occasional parallel delegation for subtasks that are genuinely independent and time-sensitive. The primary agent maintains context and continuity; specialists run on demand for bounded, well-defined work.

The orchestration layer is thin — a queue runner that picks up cards, tracks state in files, and sends notifications when things complete. Not a complex coordination system, just enough structure to run tasks reliably without you watching.

This isn't elegant architecture. It's the minimum that works reliably, built up from what actually failed in simpler setups.


I build with Claude every day and write about what it's actually like to ship AI-powered products. Subscribe at shoofly.dev/newsletter — building AI products in the real world, not what the press releases say.