Skip to content

Subagent Patterns: When to Spawn vs Stay In-Context

21 min read

Subagent Patterns: When to Spawn vs Stay In-Context

· 21 min read
A flock of starlings rendering an emergent coordination pattern across an open landscape, used as a metaphor for parallel subagent dispatch

In June 2025, Cognition published “Don’t Build Multi-Agents.” Anthropic published the opposite case the next morning, complete with a benchmark showing a multi-agent system beat a single agent by 90.2% on their research eval (Anthropic Engineering, 2025). Both posts were right. The argument never resolved because the field talked past the question that actually blocks people sitting in front of Claude Code.

When should you spawn a subagent, and when should you stay in-context?

The cost of getting that wrong runs in both directions. Senior engineers ship working multi-agent setups that quietly burn 15x more tokens than a chat session for marginal gains (Anthropic Engineering, 2025). Others stay single-agent on tasks where parallel dispatch would have saved them an hour. The error mode isn’t only “people overuse subagents.” It’s also “people refuse to use them on the work they fit best.”

This post is the decision artifact: a five-question tree, the 2026 token math, three reproducible failure modes, and an architectural reframing that makes the spawn-vs-stay call automatic. It’s drawn from running this stack on shipped tooling, not from re-reading the docs.

Key Takeaways

  • Multi-agent dispatch is a token-fan-out mechanism, not magic. Token usage alone explains ~80% of the performance variance Anthropic published (Anthropic Engineering, 2025).
  • Five questions decide spawn vs stay: token weight, parallelism, state coupling, plan dependence, bounded scope.
  • Parallel is a research pattern. Serial is a coding pattern. Anthropic’s own caveat: multi-agent is “less effective for tightly interdependent tasks such as coding.”
  • Subagent boundaries are module boundaries. Cohesion high, coupling low, contract frozen at the prompt schema. The contract matters more than the count.

Why does the dispatch question keep getting skipped?

Most subagent content lives in three buckets and skips the operational call. Setup tutorials cover YAML frontmatter and where to put .claude/agents/*.md. Pattern catalogues survey orchestrator-worker, generator-verifier, agent teams, message bus. Position pieces argue Cognition versus Anthropic. None of those is the artifact a senior engineer needs in the moment they’re about to type “use a subagent.” Stack Overflow’s 2025 survey found 31% of developers use AI agents and 17.5% of professional users use them daily (Stack Overflow, 2025). The audience needs decision support, not advocacy.

There’s a structural reason the artifact is missing. Decision trees ossify quickly when a platform moves, and Claude Code keeps moving. The Task tool was renamed to Agent in v2.1.63 (Claude Code docs). Forked subagents shipped behind a flag in v2.1.117. Most authors prefer to teach principles over published heuristics that might rot in six months.

But “this might rot” isn’t a reason to skip the artifact. It’s a reason to date it. This post is dated April 2026 and built around 2026 prices. If the platform shifts under it, the questions still hold. The numbers under each branch are the part you’ll need to refresh.

How do you decide whether to spawn a subagent?

Five questions decide whether to spawn. Each one is a yes-or-no, and the combined answer pattern picks one of four outcomes. Anthropic’s own task-sizing rule bounds the fan-out: 1 agent and 3 to 10 tool calls for fact-finding, 2 to 4 subagents at 10 to 15 calls each for direct comparison, 10+ subagents for complex research (Anthropic Engineering, 2025). The truth table below is the operational version, the part Anthropic stops short of writing down.

The questions, in order:

  1. Token weight. Will this side task generate more than ~1,000 tokens of throwaway output the main agent doesn’t need to retain? (Y = yes, it generates a lot.)
  2. Parallelism. Are there at least three independent units of work? (Y = yes, three or more.)
  3. State coupling. Do those units share mutable state or write to overlapping files? (Y = yes, they’re coupled.)
  4. Plan dependence. Does the work change the main plan? (Y = changes the plan; N = only informs it.)
  5. Bounded scope. Is the work bounded with a clear stop condition? (Y = bounded; N = open-ended.)

Read each row left to right. The first three columns describe the work shape (throwaway output, independent units, shared state). The last two are the constraint shape (whether the work changes the plan and whether the scope is bounded). Row 1 is the canonical research-shaped workload: read-only Explore subagents on a refactor, comparison tasks, summarisation fan-out. Row 2 is the heavy-but-interleaved case where you can’t afford to lose the trace. Row 3 is parallel-shaped work whose units genuinely depend on each other; sequential dispatch or a single forked child preserves order without losing speed. Row 4 is genuine collaboration; reach for an agent team or stop and pair with a human.

The lead agent spawns 3 to 5 subagents in parallel rather than serially, and each subagent calls 3+ tools in parallel (Anthropic Engineering, 2025). Past 10 children for non-research work, coordination tax exceeds savings. Lean toward agent teams, or split into multiple dispatch rounds.

What does a subagent actually cost?

Standard agent loops burn ~4x chat tokens; multi-agent setups burn ~15x (Anthropic Engineering, 2025). At 2026 pricing, a 4-subagent fan-out on Sonnet 4.5 costs roughly the same as a single Opus 4.7 session with the full context. Below ~10,000 tokens of throwaway exploration, stay in-context. The break-even is closer than most posts suggest.

The price ladder, captured against the Claude API pricing page in April 2026:

  • Opus 4.7: $5 input, $25 output per million tokens.
  • Sonnet 4.5: $3 input, $15 output.
  • Haiku 4.5: $1 input, $5 output.
  • Cache reads: 0.1x base input across the family.
  • Managed Agents: $0.08 per session-hour on top of token rates (Anthropic, 2026).

A worked example. You’re about to refactor a 50,000-LOC TypeScript repo. Exploration generates ~40,000 tokens of greps and reads on the throwaway side. Pattern A is a single Opus 4.7 session with the full context: roughly $0.20 input + $0.50 output, ~$0.70 per task. Pattern B is an Opus 4.7 orchestrator plus 4 Explore subagents on Haiku 4.5 (read-only). Naive multi-agent burns ~15x baseline tokens, but on cheaper models, so the bill lands in roughly the same neighbourhood. With prompt-cache reuse on a forked subagent, a 180,000-token shared context costs ~18,000 tokens effective on the first turn (Mejba Ahmed, 2026; corroborated by BuildThisNow, 2026). Fan-out wins when at least three children re-use the cache.

The 80% rule does most of the work in this section. Token usage alone explains ~80% of performance variance on the BrowseComp benchmark (Anthropic Engineering, 2025). Most “multi-agent magic” is just spending more tokens. The honest version of “spawn for the lift” is “spawn when the extra tokens land where they help.” That is the role of the decision tree above.

If you want visibility into what your dispatch decisions actually cost over time, tracking the 5-hour window when subagents fan out makes the trade-off concrete on your own terminal. The token math doesn’t lie; you just have to look.

Which failure modes can you reproduce?

Naive parallel dispatch fails predictably. Subagents can’t see each other, so they make incompatible decisions. Coordination tax compounds with token bloat. The same model that intends five children sometimes emits one Agent call and hallucinates the other four (anthropics/claude-code#29181, 2026). Three reproducible failures are below; each ships with a one-sentence fix.

Failure 1: The Flappy Bird shape

Cognition’s verbatim worked example: subagent-1 builds a Super Mario background, subagent-2 builds a non-physical bird, neither sees the other (Cognition, 2025). The same shape reproduces on a real codebase. Spawn parallel agents to write a TypeScript interface and its consumer at the same time. Watch them disagree on the field names because each one decided from its own context what “looks idiomatic.” Compile fails on first run.

The fix is one sentence: route any work that shares a contract through a single agent, or pass the contract as a frozen input to both children before dispatch. Subagents can’t coordinate after the fact. They can only consume a coordinated input.

Failure 2: The hallucinated fan-out

The parent intends 5 parallel Agent calls. It emits 1, fabricates the other 4 results, and writes a confident summary as if all five ran. No actual work was performed for four of the five children. Documented in issue #29181; reproducible with a high-fan-out prompt on busy sessions.

The fix is two-step. Name the fan-out explicitly in the prompt (“Issue these 5 tool calls in parallel: …”). Then run a verification subagent that reads the actual subagent transcripts before accepting any result. Trusting the orchestrator’s summary is the trap.

Failure 3: The four named modes

Anthropic’s own taxonomy of multi-agent failure (Anthropic, 2026):

  • Telephone game. Agents spend more tokens on coordination than on actual work.
  • Early victory. A verifier passes prematurely after one or two tests instead of the full check.
  • Problem-centric decomposition. You split work by type (one agent writes features, one writes tests, one reviews). The coordination overhead is constant; the boundary is wrong.
  • Tool proliferation. A subagent has 15 to 20+ tools and burns context understanding its options before it can pick one.

Each of these has context rot underneath it. Across 18 frontier models, every model degraded with increasing input length at every increment tested (Chroma Research, 2025). Long-context-as-rescue isn’t real. The model gets worse, not better, the more you stuff into the prompt that didn’t need to be there.

The fix runs across all four. Split work by context boundary, not problem type. Cap tools per subagent below 10. Never trust a single verifier. And don’t reach for parallel dispatch as a way to “just get more done”; reach for it when the tree above tells you to.

Parallel is a research pattern; serial is a coding pattern

Anthropic’s 90.2% multi-agent lift was on a research eval. They publish their own caveat in the same post: multi-agent systems “excel at problems that can be divided into parallel strands of research, but are less effective for tightly interdependent tasks such as coding” (Anthropic Engineering, 2025). Coding is interdependent. Default to serial.

What “research-shaped” actually means: divergent search across independent sources, summarisation fan-out where each child returns a compressed result, comparison tasks where each child evaluates a separate option. The shared property is that results compress on return; the parent doesn’t need the children’s intermediate state to verify the answer.

Coding workloads break that property in three ways. Type contracts couple files; tests couple to implementations; a refactor that touches one module touches three. The parent agent needs to verify behaviour, not just summarise findings. Compression on return loses the evidence needed for verification. The Claude Code docs themselves frame subagents as “focused tasks where only the result matters” (Claude Code docs, 2026). Coding rarely qualifies.

There’s an exception that proves the rule. Code review is the one coding workload that genuinely is research-shaped: independent review dimensions (security, style, coverage, type design) compress to a checklist on return. That’s the parallel-review pattern in production wired into Pylon. Specialised reviewers fan out, each looks at its slice, the main agent synthesises a triaged report. It works because the dimensions are genuinely independent and the outputs compress. The same architecture on a refactor would melt.

Subagent boundaries are module boundaries

Treat a subagent the way you’d treat a service. Cohesion high, coupling low, contract frozen. The contract is the description, the input prompt schema, and the structured output. Tool-description improvements alone yielded a 40% decrease in task completion time in Anthropic’s internal experiments (Anthropic Engineering, 2025). Most multi-agent payoff is in the contract, not the count.

Cohesion: one responsibility per subagent. The description should fit in a single sentence. If it needs “and,” split. Tool proliferation is a cohesion failure: a subagent with 20 tools has too many responsibilities to choose between, and the model burns context just deciding which tool to reach for.

Coupling: through the parent, not between siblings. Subagents do not see each other’s traces. The Claude Code docs are explicit: “the verbose output stays in the subagent’s context while only the relevant summary returns to your main conversation” (Claude Code docs, 2026). Sibling-to-sibling state is structurally impossible. If two subagents need to coordinate, you’re designing an agent team, not a subagent dispatch. Different primitive, different mental model.

SubagentsAgent teams
CommunicationResult returns to parent onlyPeer-to-peer messaging
ContextVerbose stays local; summary returnsEach instance fully independent
CoordinationParent orchestratesShared task list, self-coordinated
Best forFocused tasks, single result mattersGenuinely collaborative work
Token costLowerHigher

Contract: frozen at dispatch. The description is the interface. The input prompt schema is the parameter list. The structured output is the return type. If you would never let two services silently disagree on a type, don’t let two subagents do it either. Pass the type. Validate the return. Treat dispatch as an RPC call, not a vibe-based delegation.

This reframing is what makes the layered scaffolding around the model actually compose. The architectural layer above subagents is the project’s discipline; the architectural layer below is the model’s tooling. Subagents sit in the middle, and like any middle layer, they earn their keep through clean interfaces, not through clever internals.

A short field guide

Three concrete situations. The dispatch tree collapses to a one-sentence answer once you’ve run it on the same shapes a few times. New situations slot in by analogy.

Codebase exploration before a refactor: spawn parallel. High token weight (greps return noise the parent doesn’t need to retain). Independent units (each child looks at a different module). State-free (no writes). Informs the plan; doesn’t change it. Bounded (each child has a “summarise this module” stop condition). Use 3 to 5 Explore subagents on Haiku 4.5 for cost; orchestrator on the parent’s model. This is the canonical “yes / yes / no / inform / bounded” leaf, and it’s the one most engineers under-use.

Multi-file refactor with shared types: stay in-context, or fork. Low independence (types couple files). High state coupling (shared interfaces, shared tests). Plan-changing (each finding may invalidate the next step). Open-ended (you don’t know the stop condition until you’re nearly done). Stay in-context with frequent compaction, or use a single forked subagent if you need to A/B a subtree against the main thread. Parallel dispatch here is the trap most first-time multi-agent users fall into.

Code review before merge: spawn specialised subagents. The one coding workload that’s research-shaped. Independent dimensions (security, style, coverage, type design). State-free. Bounded scope (each reviewer reads its slice). Results compress to a checklist on return. The pattern is the same one @iceinvein/agent-skills and the parallel-review setup in Pylon both encode: specialisation by system prompt, fan-out at dispatch, synthesis at return.

The connection to context engineering is direct. The dispatch decision is an allocation question one level up from the per-surface allocation matrix for CLAUDE.md, skills, memory, and MCP. Both are the same discipline: budget where the work belongs, not where it’s easiest to write.

Frequently Asked Questions

When should I use subagents in Claude Code?

When the side task generates more than ~1,000 tokens of throwaway output, splits into 3+ independent units, doesn’t share state with the main work, only informs the plan, and has a clear stop condition. If any of those flips, stay in-context. The five-question tree earlier in this post is the operational version of that rule.

How much more do subagents cost?

Standard agent loops use ~4x chat tokens; multi-agent setups use ~15x (Anthropic Engineering, 2025). Forked subagents on Sonnet drop near 10% of that on the first turn via prompt-cache reuse. At 2026 pricing, a 4-Sonnet fan-out is roughly equivalent in dollar cost to a single Opus session.

What’s the difference between subagents and agent teams?

Subagents report results back to a single parent; their verbose output stays in the subagent’s context and only the summary returns (Claude Code docs, 2026). Agent teams coordinate independent Claude instances that message each other directly. Subagents are cheaper; teams are for genuinely collaborative work.

Can Claude Code subagents run in parallel?

Yes. The parent issues multiple Agent tool calls in one tool-use block. A known footgun (issue #29181) is that the model sometimes emits one call and hallucinates the rest. Mitigate by naming the fan-out explicitly in the prompt and running a verification subagent that reads the actual transcripts before accepting any result.

Why do multi-agent systems fail more often than they should?

Four named modes (Anthropic, 2026): telephone game, early victory, problem-centric decomposition, tool proliferation. Plus Cognition’s structural critique: subagents can’t see each other’s traces, so independent decisions silently disagree (Cognition, 2025). Every failure mode you’ll meet is a variant of those five.

The Real Argument

Dispatch is a five-question call. Spawn when context pollution, parallelism, state isolation, plan-independence, and bounded scope all line up. Stay in-context otherwise. The 90.2% lift Anthropic published was on a research workload; coding is interdependent, and parallel dispatch’s coordination tax usually exceeds its wall-clock savings.

If you take one thing from this post, take this: the subagent question isn’t “does Claude Code support this?” It’s “does this work compress on return?” If yes, fan out. If no, stay. The model is the same in both cases. The work shape is everything.

Share this post

If it was useful, pass it along.

What the link looks like when shared.
X LinkedIn Bluesky

Search posts, projects, resume, and site pages.

Jump to

  1. Home Engineering notes from the agent era
  2. Resume Work history, skills, and contact
  3. Projects Selected work and experiments
  4. About Who I am and how I work
  5. Contact Email, LinkedIn, and GitHub