Model Routing

Route agents to the right model tier for cost efficiency without sacrificing quality.

Why Model Routing?
Model Tiers
Full Routing Table
Cost Savings
1. Example: 9-Issue Wave Execution
When to Use Each Tier
Override Mechanism
1. When to Override
2. Override Rules
Integration with waves-controller
Related Pages

Why Model Routing?

Not every agent needs the most expensive model. A board-auditor that checks labels on GitHub issues does not require the same reasoning depth as a heal-loop agent generating code fixes from contract semantics.

Model routing assigns each agent to the cheapest model tier that can reliably perform its task. The result is a 40-60% reduction in token cost with no measurable drop in output quality.

Inspired by forge by Ikenna N. Okpala, adapted from the TinyDancer model routing pattern by Mondweep Chakravorty.

Model Tiers

Tier	Model	Strengths	Cost
Haiku	claude-haiku	Fast, cheap, good at structured checks and validation	Lowest
Sonnet	claude-sonnet	Strong code generation, contract writing, test creation	Medium
Opus	claude-opus	Deep reasoning, complex fixes, architectural decisions	Highest

Full Routing Table

Haiku Tier (Lightweight Checks)

These agents perform structured validation, pattern matching, or simple report generation. They do not generate code or make architectural decisions.

Agent	Purpose	Why Haiku
board-auditor	Check issue labels, staleness, compliance	Simple field checks
contract-validator	Verify patterns exist/absent in files	Regex matching and reporting
journey-enforcer	Check journey coverage and status	Read YAML, compare to test results
test-runner	Execute tests, report pass/fail	Run commands, parse output
e2e-test-auditor	Find unreliable or missing E2E tests	Compare test files to contracts
ticket-closer	Close issues with completion summary	Template-based GitHub API calls

Sonnet Tier (Code & Contract Generation)

These agents generate code, contracts, or tests. They need good code generation but not deep architectural reasoning.

Agent	Purpose	Why Sonnet
waves-controller	Orchestrate 8-phase pipeline	Coordination logic, not deep reasoning
specflow-writer	Generate contracts from GitHub issues	YAML generation from structured input
specflow-uplifter	Add Specflow to existing codebases	Code analysis and contract inference
contract-generator	Generate YAML from plain text specs	Structured output generation
contract-test-generator	Generate test files from contracts	Code generation from YAML
dependency-mapper	Parse issue dependencies, build graph	Text parsing and graph construction
sprint-executor	Execute sprint issues	Coordination, delegates to other agents
migration-builder	Generate SQL migrations	SQL generation from requirements
frontend-builder	Generate React components	Code generation from specs
edge-function-builder	Generate serverless functions	Code generation from contracts
playwright-from-specflow	Generate Playwright tests from journeys	Test code generation from YAML
journey-tester	Create cross-feature E2E tests	Test design and code generation

Opus Tier (Deep Reasoning)

These agents require understanding of code semantics, architectural intent, and potential side effects.

Agent	Purpose	Why Opus
heal-loop	Autonomous fix loop for contract violations	Must reason about fix correctness and side effects

Cost Savings

Example: 9-Issue Wave Execution

Without model routing (all agents on Opus):

Phase	Agents Invoked	Tokens (est.)
Contracts	specflow-writer (x9)	~180K
Implementation	migration + frontend + edge (x9)	~540K
Testing	playwright + test-runner (x9)	~270K
Validation	contract-validator + journey-enforcer	~30K
Closure	ticket-closer (x9)	~45K
Total		~1,065K tokens

With model routing:

Phase	Tier	Tokens (est.)
Contracts (Sonnet)	Sonnet	~120K
Implementation (Sonnet)	Sonnet	~360K
Testing (Sonnet + Haiku)	Mixed	~150K
Validation (Haiku)	Haiku	~10K
Closure (Haiku)	Haiku	~15K
Total		~655K tokens

Savings: ~38% fewer tokens, plus Haiku/Sonnet tokens cost less per token.

Effective cost reduction: 40-60% depending on wave composition.

When to Use Each Tier

Use Haiku When…

The task is read and report (no code generation)
The output is structured and predictable (pass/fail, labels, counts)
The agent follows a template (close issue with summary, run tests and parse output)
Speed matters more than nuance

Use Sonnet When…

The task involves code generation (TypeScript, SQL, YAML)
The agent needs to understand requirements and produce artifacts
The output needs to be correct and idiomatic but follows established patterns
Most Specflow agents fall here

Use Opus When…

The task requires reasoning about consequences (will this fix break something else?)
The agent must understand intent beyond pattern matching
The fix involves multiple interacting constraints
Currently, only the heal-loop agent needs this level

Override Mechanism

Override the default model routing in .specflow/config.json:

{
  "model_routing": {
    "overrides": {
      "specflow-writer": "opus",
      "board-auditor": "sonnet"
    }
  }
}

When to Override

Scenario	Override
Complex domain with nuanced contracts	`specflow-writer` -> `opus`
Large codebase with subtle patterns	`contract-validator` -> `sonnet`
Simple project with few contracts	`waves-controller` -> `haiku`
Debugging agent behavior	Any agent -> `opus` (temporarily)

Override Rules

Overrides apply per-project, not globally
You can only override to a higher tier (upgrading), or back to the default tier
Downgrading below the recommended tier is allowed but may reduce output quality
Override settings are checked into version control alongside your contracts

Integration with waves-controller

The waves-controller reads model routing configuration when spawning agents:

waves-controller (Sonnet):
  → spawn specflow-writer (Sonnet)
  → spawn migration-builder (Sonnet)
  → spawn frontend-builder (Sonnet)
  → spawn contract-validator (Haiku)
  → spawn test-runner (Haiku)
  → spawn heal-loop (Opus)      ← Only if violations found
  → spawn ticket-closer (Haiku)

The waves-controller itself runs on Sonnet. It does not need Opus because its job is coordination and delegation, not deep reasoning.

Agent Reference – All 23+ agents and their roles
Self-Healing Fix Loops – The Opus-tier heal-loop agent
DPAO Methodology – How parallel execution works
Acknowledgments – Attribution for TinyDancer and forge

View on GitHub