scafld is spec-driven orchestration for AI coding agents. It enforces a constraint that should be obvious: think before you type.
Every non-trivial task becomes a Markdown living spec before a single line of code changes. The spec defines what will change, in what order, with what acceptance criteria, and how to roll it back if it breaks. A human reviews and approves the spec. Only then does the agent execute; phase by phase, with every transition derived from the spec and session ledger instead of from chat memory.
That is the product shape: the agent is replaceable, but the protocol is not. Plans survive context loss; the session ledger preserves the receipts. Review is not self-assessment; it is a separate adversarial gate before completion.
This is the same separation of planning from execution that every serious engineering discipline has always required, applied to AI coding agents. The spec is a Markdown file with frontmatter. The runtime is a single native Go binary. The prompts are markdown. Any agent that can read files and execute shell commands can run the full workflow; Claude Code, Cursor, Copilot, Windsurf, whatever.
Install
# native Go install
go install github.com/nilstate/scafld/v2/cmd/scafld@latest
# native binary via npm wrapper
npm install -g scafld
# native binary via pipx wrapper
pipx install scafld
# direct download
curl -fsSL https://0state.com/scafld/install | sh
The Go binary is the product. The npm and PyPI packages are distribution shims that fetch the matching native binary from GitHub releases. Go package documentation is indexed at pkg.go.dev.
The spec as contract
The spec is a contract between what was requested and what gets delivered.
Before any code changes, there must be a machine-readable Markdown artifact that defines precisely what will change. The human approves the plan, not the outcome. Once approved, the agent operates autonomously within those bounds. Another agent, or a human, can pick up the same spec and session and derive the same state, next command, and review gate. Prompts are ephemeral. Specs are artifacts.
A spec declares:
- Task definition - objectives, scope boundaries (in/out), assumptions, size, risk level
- Context - packages affected, files impacted, architectural invariants that must be preserved
- Touchpoints - every system, module, or adapter the change will touch
- Risks - what could go wrong, impact level, mitigation plan
- Phases - ordered execution steps, each with file-level change declarations and acceptance criteria
- Rollback - per-phase undo commands so failure is recoverable, not catastrophic
- Definition of done - explicit checklist items that get checked off during execution
Planning mode
Planning is a structured exploration cycle that the agent runs conversationally with the developer:
- THOUGHT - interpret the request in repo terms, identify unknowns
- ACTION - search the codebase, read files, check diffs to answer those unknowns
- OBSERVATION - capture what was learned: files, invariants, risks, dependencies
- THOUGHT - update the spec, ask clarifying questions when information is missing
- REPEAT until all required fields are filled and assumptions are explicit
The agent is in read-only mode during planning. It can explore anything but change nothing outside .scafld/specs/. If planning gets blocked on missing information, the spec saves with planning notes and the agent tells you exactly what it needs.
This is how you get specs that are executable by another agent without any additional back-and-forth. The planning loop does the work upfront so execution doesn’t have to guess.
Lifecycle and execution
┌─── request changes ───┐
▾ │
draft ──▸ harden ──▸ approved ──▸ active ──▸ review ──▸ completed
(optional) │ │ │
HUMAN phase loop adversarial
GATE ┌──────────────┐ review
│ apply changes │ │
│ run criteria │ failed ──▸ fix
│ record result │ │ │
└──────────────┘ └─────────┘
│
failed ──▸ rollback ──▸ resume
The filesystem is the state machine. Specs physically move between directories as they progress:
.scafld/specs/
drafts/ planning in progress
active/ approved and currently executing
archive/YYYY-MM/ completed, failed, or cancelled
Each transition is enforced by the CLI. You can’t skip the approval gate. You can’t execute a draft.
Execution is phase-by-phase. Each phase reads the spec, applies changes, runs every acceptance criterion, and records pass/fail results with timestamps directly into the session ledger. The spec re-renders from the session, not from the agent’s claims. If a criterion fails, the phase rolls back independently; completed phases stay intact. If execution gets interrupted, the resume protocol picks up from the first pending or failed phase, not from scratch.
Session as truth
The session ledger is the durable evidence layer. Every meaningful event during a build appends a typed entry to .scafld/runs/<task>/session.json: criterion attempts, phase transitions, provider invocations, review verdicts, override decisions. The Markdown spec re-renders from the session; the session is never read from the spec.
This inversion matters. When the agent says “phase 2 passed,” the spec says nothing until the session has a corresponding criterion_attempt entry with the recorded exit code and evidence. The agent cannot lie about what happened because the lie would have to be written to the ledger.
Harden
Between planning and approval, scafld harden <task-id> interrogates the draft one question at a time. The agent walks down the design tree and resolves upstream decisions before downstream ones, so questions are never wasted on premises that may shift. If the answer is already in the repo, it should inspect the code instead of asking. Each recorded question carries a grounded_in value pointing to the spec gap, verified code location, or archived precedent that made the question worth asking. Invented citations are forbidden.
This is optional and operator-driven. scafld approve does not consult harden status. Run it on high-risk or ambiguous specs; skip it on trivial ones.
Validation
Not all changes deserve the same level of verification. scafld scales validation proportionate to risk:
- Light (micro/small, low risk) - compile check + acceptance criteria only
- Standard (medium risk) - add targeted tests per phase, full test suite + linter + typecheck + security scan before commit
- Strict (high risk) - everything in standard, plus boundary checks per phase to catch cross-module side effects
The profile derives from the task’s risk level or can be set explicitly.
Scope discipline
Scope is part of the deterministic contract. The spec declares the files, packages, boundaries, and invariants the agent is allowed to touch. Build evidence records what actually ran. Review challenges whether the diff stayed inside the approved shape.
That makes scope drift visible at the gate that matters: the task cannot complete just because the implementation agent says it stayed on course. The next accepted state must be derivable from the spec, the session, and a passing adversarial review.
Adversarial review
Ask an AI “how did you do?” and it says great. Always 8 or 9 out of 10.
Ask it “what’s wrong with this?” and it actually finds things. Real things; a missing null check on line 47, a caller that assumes a parameter that just changed shape, a hardcoded value that should come from config. The same model that rubber-stamps its own work will genuinely tear it apart when you frame the task as critique instead of self-assessment.
scafld structures this. After execution, scafld review runs automated passes, scaffolds a machine-validated review artifact, and records review provenance. A separate adversarial provider (codex or claude, isolated from the executor) reads its own diff through three lenses:
- Regression hunt - for each modified file, check all callers and importers. What breaks?
- Convention violations - read the project’s documented rules. What did you violate?
- Defect scan - hardcoded values, off-by-one errors, missing boundary checks, race conditions, copy-paste bugs, unhandled error paths.
Every finding cites a file and line number. Findings are blocking (must fix) or non-blocking (should fix). The review produces a verdict.
The review provider runs in a sandboxed read-only environment with a workspace mutation guard; if the provider tries to modify files mid-review, the result is discarded and the review fails closed. The structured-output schema is enforced at the provider boundary so packets must be well-formed before they reach the gate.
scafld complete reads the latest review round, validates its structure, records the verdict into the spec, and archives. If the round is missing, malformed, incomplete, or failed, it refuses. The only bypass is an exceptional human-reviewed review event with an audited reason and interactive confirmation; record that evidence with scafld review --human-reviewed, then run scafld complete.
This works because critique is cognitively easier than creation. When building, the agent optimises for completion. When reviewing, it optimises for finding flaws. Splitting the two roles is what gives the review gate its teeth. Self-assessment can be recorded as evidence, but it never satisfies the completion gate.
Guardrails
Safety controls
Some actions require human approval regardless of the spec: schema migrations, public API changes, data deletion, production deployments. These are defined in .scafld/config.yaml and enforced during execution.
scafld also automatically prevents common security violations: hardcoded secrets, unbounded queries, SQL injection patterns, XSS vulnerabilities. The security scan runs as part of the standard and strict validation profiles.
Invariants
Non-negotiable architectural rules the agent cannot violate regardless of the task:
- Domain boundaries - services stay in their layers, no circular dependencies
- No legacy fallbacks - no dual-reads, dual-writes, or runtime shims. Migrate immediately with a one-off script
- Public API stability - HTTP contracts and event schemas don’t change without explicit approval
- Config from environment - never hardcoded
- No test logic in production - fixtures and mocks stay in test files
These are customisable per project. You define your own invariants in AGENTS.md and reference them by name in .scafld/config.yaml. Every spec declares which invariants it must preserve.
If the task requires violating an invariant, the agent pauses and asks. Non-optional.
Workspace support
For projects with multiple codebases; an API, a frontend, an SDK, an MCP server; the workspace pattern gives the agent visibility across all of them from a single root. Create a root repo, add your codebases as git submodules, run scafld init. The root holds the orchestration layer and the agent sees the whole picture.
scafld init Scaffold workspace (extract bundled core, create directories)
scafld plan <task> Create a spec (scaffold in drafts/)
scafld harden <task> Optional: interrogate the draft with grounded questions
scafld validate <task> Check spec against the contract
scafld approve <task> Human approval gate (drafts/ -> active/)
scafld build <task> Execute the phase loop, record evidence in session
scafld exec <task> Run a specific criterion or phase
scafld review <task> Run automated passes + adversarial provider review
scafld review <task> --human-reviewed --reason "manual audit"
Exceptional audited override when the review gate is blocked
scafld complete <task> Read review, record verdict, archive (requires passing review)
scafld fail <task> Archive as failed
scafld cancel <task> Archive as cancelled
scafld status <task> Review spec details and progress
scafld list [filter] List specs by state
scafld report Aggregate stats across all specs
scafld handoff <task> Render compact context for the next model voice
scafld update Refresh managed core files from the embedded bundle
scafld report aggregates task state, pass rates, size/risk distributions, and monthly activity across your spec history. Completed specs archive to .scafld/specs/archive/ with the session ledger and diagnostics that produced the final state. When something breaks in production six months from now, you trace back to the deterministic contract that approved it and the evidence that moved it through the gate.