Every AI conference talk this quarter is about multi-agent workflows. Agent swarms. Orchestration frameworks. Teams of specialized AI agents collaborating on complex tasks. The demos are impressive: one agent researches, another writes code, a third reviews it, a fourth deploys it. It looks like the future.

I've been running multi-agent workflows in production for two months while building Unmarkdown™, a SaaS product with a Next.js frontend, Supabase backend, Stripe billing, MCP server, Chrome extension, Obsidian plugin, and 875 tests across 59 files. I've used Claude Code Agent Teams, subagents with worktree isolation, and plain shell-script orchestration.

Here's the honest truth: 95% of tasks don't need multi-agent orchestration. A single well-configured Claude Code instance is faster, cheaper, and more predictable. But the other 5%? Those tasks are genuinely transformative with multiple agents. The trick is knowing which bucket your task falls into.

The hype cycle is real

The framework ecosystem has exploded. Claude Flow, LangGraph, CrewAI, AutoGen, Microsoft's Semantic Kernel, Amazon's multi-agent orchestrator. Each promises elegant coordination between specialized agents. The pitch is compelling: break complex tasks into subtasks, assign each to a specialist agent, coordinate results, ship faster.

The problem is that most development tasks aren't complex enough to justify the overhead. Writing a new API endpoint, fixing a CSS bug, adding a database migration, implementing a feature from a spec: these are sequential tasks that fit comfortably in one context window. Adding a second agent doesn't make them faster. It makes them slower, because now you have coordination overhead on top of the actual work.

I learned this the hard way. Early on, I tried using parallel agents for everything. Two agents working on the same feature from different angles. The result? Merge conflicts, inconsistent naming conventions, and duplicated work that took longer to reconcile than it would have taken one agent to do sequentially.

Why single-agent is usually better

After two months of experimentation, I've identified five reasons a single agent outperforms multi-agent setups for most tasks.

Simpler context management. One conversation, one context window. The agent sees everything it's done, every file it's read, every decision it's made. There's no information loss between agents, no summarization artifacts, no "Agent B doesn't know what Agent A discovered." Context is the most valuable resource in AI-assisted development, and splitting it across agents dilutes it.

No coordination overhead. When agents need to communicate, someone has to define the communication protocol. What information gets passed between agents? In what format? How does Agent B know Agent A is done? Every coordination point is a potential failure mode. With a single agent, there's nothing to coordinate. It just works on the next step.

Predictable behavior. I can watch a single agent work in real time. I see every file it reads, every edit it makes, every command it runs. With multi-agent setups, I'm monitoring multiple streams simultaneously, trying to piece together what's happening across agents. When something goes wrong (and it will), debugging a single conversation thread is straightforward. Debugging distributed agent behavior across three simultaneous conversations is genuinely painful.

Lower cost. Each agent has its own context window. If your task requires 50K tokens of context, a single agent uses 50K tokens. Three agents working on the same task might each need 30K tokens of overlapping context, totaling 90K tokens for the same outcome. The math rarely works in multi-agent's favor unless the subtasks are truly independent with no shared context.

Easier debugging. When the output is wrong, a single conversation thread tells you exactly where things went sideways. With multi-agent, the bug might be in Agent A's output, Agent B's interpretation of that output, the coordination layer between them, or the merge logic that combines their results. The debugging surface area scales multiplicatively with agent count.

The 5% where multi-agent is transformative

Despite all of that, there are tasks where multi-agent orchestration isn't just helpful, it's the only practical approach. These share a common trait: the work naturally decomposes into independent, parallelizable chunks where the time savings of parallelism vastly exceed the coordination cost.

Large-scale refactors

I recently renamed a core concept across the Unmarkdown™ codebase. The old name appeared in 50+ files: components, utilities, tests, API routes, database helpers, type definitions. With a single agent, this is a sequential slog. Read a file, update it, move to the next one. Thirty minutes minimum, and the agent's context window fills up halfway through.

With multi-agent, I used one coordinator agent to plan the changes (identify every file, determine the correct replacement in each context), then spawned subagents to execute the changes in parallel across different directories. The coordinator handled the plan; the subagents handled the execution. Total time: about eight minutes, including merge validation.

The key insight is that the subtasks were truly independent. Renaming a variable in src/components/ doesn't affect renaming the same variable in tests/. There's no coordination needed during execution, only before (planning) and after (validation).

Exploratory architecture decisions

This is my favorite use case. When I'm evaluating different approaches to a problem, sequential exploration biases me toward the first approach I try. I invest time in it, see it partially working, and resist abandoning it even when a better option exists. Sunk cost bias is real, even with AI-generated code.

With parallel agents, I can try three approaches simultaneously. "Agent A, implement this with server-side rendering. Agent B, try the client-side approach. Agent C, explore the edge function route." Each agent works in its own worktree, isolated from the others. When they're done, I compare the results side by side with no bias toward whichever finished first.

I used this approach when deciding how to handle the template CSS pipeline in Unmarkdown™. Three agents, three approaches, one clear winner that I might not have tried if I'd gone sequentially.

Cross-system integration work

Some tasks genuinely require simultaneous activity across different systems. One agent modifies the API, another runs integration tests against it in real time, a third monitors error logs for unexpected failures. This isn't about speed; it's about catching interaction effects that only surface when systems are running together.

I've used this pattern when shipping changes to the Stripe webhook handler. One agent makes the code change, another watches the PostHog event stream for billing state machine anomalies, a third runs the test suite. The parallel monitoring catches edge cases that sequential test-then-deploy misses.

Batch content generation

This is where I use multi-agent most often. When I needed to write 15 blog posts for the Unmarkdown content pipeline, a single agent writing them sequentially would take hours, and each post would bleed context from the previous one (leading to repetitive phrasing and structure). Instead, I spawned 15 subagents, each in its own worktree, each with a focused brief. Fifteen posts in 20 minutes, each with a distinct voice because no agent was contaminated by the others' output.

The same pattern works for documentation. One agent reads the codebase and extracts the API surface. Another writes the user-facing docs. A third validates that the docs match the actual behavior. Each agent is a specialist, and the combined output is better than one generalist trying to do all three tasks in sequence.

Multi-agent orchestration frameworks in 2026

If you decide multi-agent is right for your task, here are the main options as of early 2026.

Claude Code subagents with worktree isolation. This is what I use most. Claude Code can spawn subagents that run in isolated git worktrees. Each agent has its own working directory, its own branch, its own context. The parent agent coordinates. It's production-ready, requires no external framework, and the worktree isolation prevents agents from stepping on each other. I wrote a detailed walkthrough in Parallel Agents and Worktrees in Claude Code.

Claude Code Agent Teams (experimental). Built into Claude Code, Agent Teams lets you define roles (architect, implementer, reviewer) and have agents collaborate on a shared task. It handles communication between agents automatically. Still experimental, so expect rough edges, but the ergonomics are promising for tasks where agents need ongoing dialogue rather than independent execution.

Claude Flow. An open-source orchestration framework for Claude agents. It provides primitives for defining workflows: sequential steps, parallel execution, conditional branching. Good for repeated workflows where you want to encode the orchestration pattern once and run it many times. The overhead of setting up a Flow workflow isn't worth it for one-time tasks.

DIY shell orchestration. Don't underestimate a bash script that spawns multiple Claude Code instances. For batch operations with identical structure (like writing N blog posts with different topics), a simple loop with background processes works fine. No framework needed. I use this more than I'd like to admit.

The decision tree

When a new task comes in, I run through this mental checklist:

Can one agent handle this in under 30 minutes? If yes, use a single agent. The coordination overhead of multi-agent will cost more than the time saved. Most feature implementations, bug fixes, and code reviews fall here.

Does the task decompose into truly independent pieces? If the subtasks don't share state and don't need to communicate during execution, parallel agents will save time proportional to the number of agents. Batch operations, large refactors with a pre-planned approach, and exploratory architecture fit here.

Do the pieces need to communicate during execution? If Agent B's work depends on Agent A's intermediate output (not just its final output), you need a framework that supports inter-agent communication. Agent Teams or a custom orchestration layer. Be warned: this is where complexity explodes.

Is this a one-time task or a repeated workflow? Multi-agent setups have upfront costs: defining the workflow, configuring the agents, testing the coordination logic. For a one-time task, that cost is rarely justified. For a workflow you'll run weekly, the investment compounds.

Is the task large enough that parallelism matters? Three agents doing 5-minute tasks in parallel save 10 minutes total. That's real, but the coordination overhead might eat half of it. Three agents doing 30-minute tasks in parallel save an hour. That's clearly worth the overhead.

The coordination tax

This is the concept most multi-agent enthusiasts underestimate. Every interaction between agents has a cost.

Token consumption. When agents communicate, the messages consume tokens. A coordinator agent that reads summaries from three subagents is spending context budget on coordination rather than actual work. In my experience, coordination messages consume 15-25% of total token usage in multi-agent workflows. That's a significant tax.

Result merging. When parallel agents produce outputs that need to be combined, the merge step is rarely trivial. Code changes across different files might conflict at the import level. Documentation written by different agents might use inconsistent terminology. The merge agent needs enough context to resolve these issues, which means it needs to understand what each subagent did, which means more tokens and more time.

Debugging complexity. When the final output is wrong, where did the error originate? Was it the coordinator's plan? A subagent's execution? The merge logic? With a single agent, I read one conversation and find the mistake. With multi-agent, I'm reading four conversations, cross-referencing timestamps, and trying to reconstruct the information flow. I estimate debugging multi-agent workflows takes roughly 5x longer than debugging equivalent single-agent work.

Non-determinism. Agents don't produce identical outputs given identical inputs. When three agents work in parallel, the order of completion varies, the quality of individual outputs varies, and the final merged result varies. This makes multi-agent workflows harder to test, harder to reproduce, and harder to improve iteratively.

The coordination tax means multi-agent is only worth it when the raw time savings from parallelism are substantially larger than the overhead. My rule of thumb: if parallelism doesn't save at least 3x the coordination cost, use a single agent.

My actual setup

Here's what I run day to day while building Unmarkdown™.

95% of work: single Claude Code instance. Feature development, bug fixes, code reviews, test writing, documentation updates. One agent, one context window, one conversation. I use the hub-and-spoke memory system to give the agent persistent context across sessions, and hooks to automate repetitive setup tasks. This handles nearly everything.

4% of work: subagents with worktree isolation. Batch blog post generation (like this one), large-scale refactors, parallel exploration of architectural approaches. I spawn subagents from the main Claude Code instance, each in its own worktree. The parent agent coordinates the plan and merges the results. No external framework, just Claude Code's built-in subagent capability.

1% of work: Agent Teams. Occasional large refactors where the subtasks have interdependencies, or cross-system integration work where agents need to communicate during execution. Agent Teams is still experimental, so I only reach for it when worktree-isolated subagents aren't sufficient.

The ratio tells the story. Multi-agent is powerful, but it's a specialized tool. Using it for everything is like using a chainsaw to cut bread. Technically possible, but you'll make a mess.

The output problem

Here's something nobody talks about in the multi-agent discourse: the output consolidation problem. Multi-agent workflows produce scattered artifacts. Each agent generates its own logs, summaries, code changes, and documentation fragments. When the work is done, you're left with outputs distributed across multiple conversations, worktrees, and branches.

For code, git handles the merge. But for everything else: documentation, changelogs, stakeholder updates, project summaries, you need to consolidate markdown from multiple sources into a single coherent document. This is tedious work that eats into the time you saved with parallelism.

This is actually where Unmarkdown™ fits into my own workflow. After a multi-agent session produces scattered markdown outputs, I consolidate them into a single document, apply a template for consistent formatting, and publish or share with stakeholders. One paste, one template, one published page. It's a small thing, but it removes the last friction point in the multi-agent workflow.

The honest take

Multi-agent orchestration is a real capability with genuine applications. Large refactors, parallel exploration, batch operations, and cross-system integration all benefit from it. The frameworks are maturing, and the ergonomics will keep improving.

But the industry is over-indexing on multi-agent as a default approach. Most development work is sequential, context-dependent, and benefits from a single agent's unified understanding of the problem. The coordination tax is real and often underestimated. If you're reaching for multi-agent because it sounds more sophisticated, stop and ask whether a single agent with good context engineering would do the job better.

Start with a single agent. Invest in giving it excellent context. Reach for multi-agent only when you have genuinely independent, parallelizable work where the time savings clearly justify the overhead.

That's the boring answer. It's also the correct one.

Multi-Agent Orchestration: A Practical Guide (2026)