Subagent-Driven Development: How to Fan Out a Feature Build

The previous post’s truth table sent most coding work to “stay in-context.” This post is about the exception. There’s a class of implementation work that genuinely is research-shaped, where parallel dispatch wins. The class is smaller than the multi-agent essays make it sound and bigger than zero.

The methodology has a name: subagent-driven development. The idea is older than the term, but the workflow that makes it survive contact with a real codebase is recent. Anthropic ships their own caveat alongside the 90.2% multi-agent lift: multi-agent systems are “less effective for tightly interdependent tasks such as coding” (Anthropic Engineering, 2025). That caveat is the rule. SDD is the exception that survives the caveat by changing the work shape, not by ignoring it.

Most teams that try parallel feature work without a methodology hit the failure modes Anthropic and Cognition catalogued. Interface drift. Telephone game. Problem-centric decomposition. The work ships late or wrong. Most teams give up and conclude “parallel doesn’t work for coding,” which is the wrong lesson. The right lesson is that parallel works on a narrow shape, and there’s a discipline that protects that shape.

Key Takeaways

SDD only works on the narrow slice of implementation work that’s genuinely independent at the contract layer. Use the dispatch decision tree to identify it.

The plan is the contract. Frozen interfaces, frozen tests, frozen acceptance criteria before any subagent runs.

Wave dispatch beats free-for-all. Foundation wave, feature wave, review wave, atomic commits between each.

Five failure modes are specific to parallel implementation: interface drift, test divergence, rebase storms, contract rot, synthesis exhaustion. Each maps to a wave-discipline fix.

When does implementation work qualify as research-shaped?

Anthropic published the constraint directly: multi-agent systems are “less effective for tightly interdependent tasks such as coding” (Anthropic Engineering, 2025). The caveat names the rule. The exception is implementation work where the units don’t share state, the contracts are frozen, and the results compress on return. That’s a real shape, just a narrow one.

The honest definition of “research-shaped implementation” is that the work is independent at the contract layer, even when the language is application code rather than research prose. A foundation subagent that lands a schema can return “schema landed at src/lib/settings.ts with shape Settings” and the parent doesn’t need its full transcript. A feature subagent that consumes that schema reads the file, doesn’t talk to the foundation child. That’s the shape. It lines up with the row 1 leaf of the five-question dispatch tree: Y / Y / N / inform / bounded.

What qualifies: independent routes under a stable shell; parallel migration scripts; documentation generation across many independent files; the parallel-review pattern at the merge gate. What doesn’t: any refactor; type-system changes; bug-fix work where the cause is unknown; anything touching shared infrastructure. The fan-out cap from the upstream piece still applies. Lead agent spawns 3 to 5 subagents in parallel rather than serially (Anthropic Engineering, 2025); past 5 children per wave, coordination tax exceeds savings, and the wave should split.

What is subagent-driven development?

Subagent-driven development is the workflow that uses parallel subagents to implement features whose units are genuinely independent. It inverts Cognition’s “share full agent traces” critique (Cognition, 2025) by sharing context through the plan, not through the agents themselves. The plan is the contract. That single move is what makes the pattern survive a real codebase.

Three components make up the methodology. The first is plan as contract: a frozen list of independent tasks with frozen interfaces, schemas, and acceptance criteria. The second is wave dispatch: tasks dispatched in waves, where later waves wait for earlier waves to land and integrate. The third is integration check: a verification step between waves to confirm outputs cohere before the next dispatch. None of those three is novel on its own. The combination is what earns the name.

SDD is a sibling of patterns it’s often confused with. The orchestrator-worker pattern is single-dispatch; SDD is multi-wave, where each wave is its own orchestrator-worker round. Agent teams coordinate independent Claude instances that message each other directly (Claude Code docs, 2026); SDD subagents report to a single parent and never message each other. Spec-driven development is the input shape; SDD is the execution. SDD consumes whatever specification artifact the team already has and turns it into a wave plan.

The load-bearing move is “share via plan, not via traces.” Cognition’s structural critique is that subagents can’t see each other’s traces, so independent decisions silently disagree. SDD accepts that constraint and routes coordination through the plan instead. Tool-description improvements yielded a 40% decrease in task completion time in Anthropic’s internal experiments (Anthropic Engineering, 2025); the subagent prompt is the equivalent leverage point in SDD. The plan is the prompt. The whole methodology slots inside the layered scaffolding around the model: the plan is the project layer, the subagents are the model layer, the integration check is the verification layer.

How does wave dispatch work?

Wave dispatch fans out 3 to 5 subagents on independent tasks, waits for the wave to land, runs an integration check, then dispatches the next wave. Each wave commits atomically. Anthropic’s task-sizing rule bounds the fan-out: 1 agent for fact-finding, 2 to 4 for direct comparison, more only for genuinely complex research (Anthropic Engineering, 2025). The same caps apply to implementation waves; the work shape is similar even if the language is code.

The structure has four steps. Wave 0 is the plan: the main session writes a task list with frozen contracts. No code yet. Wave 1 is the foundation: parallel subagents create the shared primitives (types, schemas, fixtures, base routes), independent at the file level. Wave 2 is the features: parallel subagents implement the features that consume the foundation, each in its own module. Wave 3 is tests and review: parallel subagents write integration tests, then parallel review subagents check the merged result.

Atomic commits between waves are non-negotiable. Each wave’s outputs commit before the next wave runs. Rollbacks are clean; reblamed history stays readable. The git boundary doubles as the integration boundary: if Wave 1 won’t commit, Wave 2 doesn’t dispatch. Standard agent loops use ~4x chat tokens; multi-agent setups use ~15x (Anthropic Engineering, 2025). Forked subagents drop near 10% of that on the first turn via prompt-cache reuse (Mejba Ahmed, 2026). The wave wins when at least three children re-use the cache and the wave’s outputs compress on return.

How do you write a plan that survives parallel execution?

A plan that survives wave dispatch has three properties: frozen interfaces, frozen acceptance criteria, and frozen file boundaries. Subagents commit to the contract before they start. Anthropic’s caution against “problem-centric decomposition” (Anthropic, 2026) is the rule applied. Split by context boundary, not by work type. The four artifacts of an SDD-grade plan are the task list (the wave breakdown), the interface schema (a typed file the wave consumes by reference), the acceptance criteria (the green-light condition for each task), and the file boundary map (which task may write to which paths, with conflicts pre-empted at plan time).

Each task entry has six fields. Name and slug. Owner subagent type (Explore, Implementer, Tester, Reviewer). Frozen inputs (types, schemas, fixtures by reference). Frozen outputs (file paths, interface signatures, acceptance criteria). Tools allowed. Stop condition. The frozen fields are the contract. Subagents don’t get to renegotiate them mid-wave.

The frozen rule has teeth. No changes to a contract mid-wave. If the integration check finds the contract is wrong, replan between waves. Don’t patch in flight. Across 18 frontier models, every model degraded with increasing input length at every increment (Chroma Research, 2025); the plan is what compresses the wave’s intent so the subagents start with a small, focused context. The plan is the workflow analogue of allocation discipline at the surface level, one floor up. Both budget where the work belongs, not where it’s easiest to write. The plan is also the plan the user reads and signs off on. The same artifact that frees the model from drift is the one that lets a human verify the work before any token spends. That alignment is the point.

A worked example: a feature in three waves

One representative feature, three waves, atomic commits between each. The example is a user-settings page with persistence, the kind of feature that ships in a normal week. Three foundation tasks land, three feature tasks consume them, three review tasks verify. The whole feature ships in roughly 90 minutes of wall-clock time at 2026 Sonnet 4.6 pricing, including the integration checks.

Wave 1, foundation. Three parallel Implementer subagents on Sonnet 4.6: a settings schema (zod or Drizzle, depending on stack), a form fixture (test data factory), and an API route stub (handler + types). Atomic commit. The integration check runs tsc --noEmit and the foundation unit tests; types compose; the wave is green. No feature subagent runs until that check is green.

Wave 2, features. Three parallel Implementer subagents on Sonnet 4.6: the settings page UI consumes the schema, the save handler consumes the route stub, and the load hook consumes both. Each lives in its own module; the file-boundary map keeps them off each other’s paths. Atomic commit. The integration check runs the local dev server and the e2e harness. Green check, dispatch the next wave.

Wave 3, review. Three parallel Reviewer subagents on Opus 4.7: style and lint, type design and API ergonomics, test coverage and edge cases. Findings synthesise into a triaged report. The main agent applies the triaged fixes. Final commit. The pattern is the same one wired into the parallel-review setup at the merge gate, reused as the closing wave of an SDD feature build.

The cost call-out, against the Claude API pricing page in April 2026: Opus 4.7 at $5 input / $25 output per million tokens, Sonnet 4.6 at $3/$15, Haiku 4.5 at $1/$5, cache reads at 0.1x. Each implementation wave runs three subagents on Sonnet 4.6; the review wave runs three on Opus 4.7. The integration check runs on Opus 4.7 to read transcripts faithfully. With prompt-cache reuse on a forked subagent, the 180,000-token shared context costs ~18,000 tokens effective on the first turn (Mejba Ahmed, 2026). The whole feature lands at roughly the dollar cost of a single Opus 4.7 session that did the same work serially. The wall-clock time is the win, not the bill.

What fails specifically in parallel implementation?

Five failure modes hit SDD specifically, beyond the four Anthropic named (Anthropic, 2026). Each maps to a wave-discipline practice that prevents it. Skipping the practice is what causes most “we tried parallel and it failed” stories. The model itself sometimes emits 1 of N intended Agent calls per message and hallucinates the rest (anthropics/claude-code#29181, 2026); wave discipline catches that at the integration check, not at the prompt.

The hallucinated fan-out underneath all five is its own failure with its own fix. The model intends N children, emits one, fabricates the other N-1. Wave discipline catches it because a wave that should have produced N artifacts but only produced one fails the integration check before the next wave dispatches. The fix isn’t a smarter prompt; it’s a structural check that compares produced artifacts against the plan’s frozen outputs. If the plan said “three files,” and one file landed, the wave fails and the orchestrator retries that single child rather than synthesising a confident summary that misrepresents reality.

How do you verify a wave before the next one starts?

Integration checking is what stops Anthropic’s “early victory” failure mode (Anthropic, 2026). A dedicated verification subagent runs after each wave: types compile, tests pass, the wave’s outputs match the frozen plan. No wave proceeds without a green check. The check is the gate, not the encouragement.

What to verify is finite and listable. Types compile across the wave’s outputs. Tests pass at the appropriate scope (unit for foundation, integration for feature, e2e for review). API contracts match the plan-entry’s frozen outputs. File-boundary respect: no subagent wrote outside its declared paths. Each of those is a Bash one-liner the verification subagent runs and compares against the plan.

How to verify is the structural part. A dedicated integration-check subagent reads the actual transcripts of the wave’s children, not just the orchestrator’s summary. The Claude Code docs are explicit that “the verbose output stays in the subagent’s context while only the relevant summary returns to the main conversation” (Claude Code docs, 2026); the integration check is the one place that bypasses that compression. The verifier opens the children’s transcripts, finds the artifact paths, and runs the listable checks against the actual filesystem.

Fail-forward versus retry is the last call. Retry when the contract was right and one subagent failed to honour it; re-dispatch that one task with the same plan entry. Fail forward when the contract was wrong; replan between waves and dispatch a fresh round. Don’t patch mid-wave. The cost of a replan is one wave; the cost of patching is contract rot in every later wave. The parallel-review pattern in production uses the same gate at the merge boundary, just at the end of the build instead of between waves.

When should you use SDD, and when should you skip it?

SDD is the right call when the work hits research-shaped implementation: independent at the contract layer, three or more units, bounded scope. Skip it for refactors, type-system changes, and any work where the units share state. Default to staying in-context unless the work clearly qualifies. Only 16.8% of professional AI agent users agree agents have improved team collaboration (Stack Overflow, 2025); SDD is the workflow that earns that number for the users in the 16.8%.

Use SDD on a greenfield feature with multiple independent routes. On parallel migration scripts. On a documentation-generation pass across many independent files. On the parallel-review pattern at the merge gate. Each of those compresses on return; each has a frozen contract a plan can name; each survives the dispatch tree’s row 1 leaf.

Skip SDD on any refactor. On any change to a shared type. On bug-fix work where the cause is unknown. On anything open-ended with no clear stop condition. The clean version of the rule is, “if you can’t write the wave’s frozen outputs in advance, you’re not running an SDD wave; you’re running a multi-agent disaster.”

The borderline cases are where the discipline earns its keep. Feature work that could be independent if the contracts were truly frozen often isn’t. The honest move is to write the plan and then check whether the contracts are actually frozen or just feel frozen. Run the five-question dispatch tree on the plan, not on the prompt. If the answers don’t line up, the plan is the problem, not the dispatch.

Frequently Asked Questions

What is subagent-driven development?

SDD is the workflow that uses parallel subagent dispatch to implement features whose units are genuinely independent at the contract layer. The plan is frozen before dispatch; subagents commit to the plan; integration checks run between waves. It’s distinct from agent teams (peer-to-peer) and from orchestrator-worker (single dispatch). The plan is the contract.

How is SDD different from agent teams?

Subagents in SDD report to a single parent and never message each other. Agent teams coordinate independent Claude instances that message each other directly (Claude Code docs, 2026). SDD is cheaper and easier to reason about because the coordination is one-way. Agent teams are the right call when the work genuinely needs sibling-to-sibling collaboration, which is rare on feature builds.

Can SDD parallelize a refactor?

Almost never. Refactors couple files through types and tests, which violates the “independent at the contract layer” requirement. Stay in-context with frequent compaction. The exception is a refactor that’s separable into independent renames or migrations; even then, validate the assumption with the dispatch decision tree before dispatching anything.

What goes in the plan?

Six fields per task: name, owner subagent type, frozen inputs, frozen outputs, tools allowed, stop condition. The frozen fields are the contract; subagents commit to them before they start. Anthropic’s caution against “problem-centric decomposition” (Anthropic, 2026) means split by context boundary, not by work type.

How much does SDD cost compared to a serial build?

Standard agent loops use ~4x chat tokens; multi-agent setups use ~15x (Anthropic Engineering, 2025). Forked subagents drop near 10% of that on the first turn via prompt-cache reuse. At 2026 Sonnet 4.6 pricing, an SDD wave often lands at the same dollar cost as a single Opus 4.7 session, but in less wall-clock time when the contracts are genuinely frozen.

The Real Argument

SDD is the narrow exception to “parallel is research, serial is coding.” The exception is real, but it’s earned. The plan is the contract, and the contract is what makes parallel implementation possible. Without it, you’re not running SDD; you’re running the multi-agent disaster Cognition warned about, with extra steps.

If you take one thing from this post, take this: the question isn’t whether Claude Code can fan out feature work. It clearly can. The question is whether your plan is frozen tightly enough that three subagents could read it and arrive at the same shape independently. If yes, run a wave. If no, the plan is the problem, not the parallelism.

Pick one greenfield feature with three independent routes. Write the plan. Run one wave. Track the cost and the wall-clock time. Decide for your codebase whether SDD is worth the discipline tax. The answer will hold longer than the prices in this post will.