CLI Reference — scafld

scafld is intentionally small. The binary teaches the same command surface to humans, agents, wrappers, and package launchers:

scafld init
scafld plan <task-id>
scafld harden <task-id>
scafld validate <task-id>
scafld approve <task-id>
scafld build <task-id>
scafld review <task-id>
scafld complete <task-id>
scafld fail <task-id>
scafld cancel <task-id>
scafld status <task-id>
scafld list
scafld report
scafld handoff <task-id>
scafld update

Global flags:

--root PATH: operate on a specific workspace root.
--json: emit a stable JSON envelope when the command supports it.
--version: print the binary version.

JSON Mode

Automation-relevant commands emit one envelope:

{
  "ok": true,
  "command": "build",
  "result": {
    "task_id": "add-auth",
    "status": "active",
    "phase": "phase1",
    "passed": 0,
    "failed": 0,
    "next": "scafld handoff add-auth"
  }
}

Failures use the same shape with ok: false and an error object carrying code, message, and exit_code.

The envelope and every command result use snake_case JSON keys. The Markdown spec schema, session ledger, and CLI automation output therefore share one public casing convention.

Exit codes:

0: success
1: generic runtime failure
2: invalid command or flag
3: validation or acceptance failure
4: review gate failure
5: cancelled context
6: workspace discovery or bootstrap failure

init

scafld init [--root PATH] [--json]

Bootstraps .scafld/ in the workspace. It installs project-owned config and prompt files, creates spec/run directories, and installs managed core assets under .scafld/core/.

init is deterministic. It does not ask an agent to infer project policy.

config

scafld config [--root PATH] [--json]

Scans the workspace in read-only mode and writes .scafld/config.proposed.yaml. The proposal contains cited evidence, agent instructions, suggested invariant IDs, discovered validation commands, and open questions.

config does not mutate .scafld/config.yaml. The operator or agent must open the cited sources and copy only verified runtime policy into the real config. Agent guidance belongs in AGENTS.md, CLAUDE.md, .claude/rules, or project prompts rather than unsupported config fields.

update

scafld update [--root PATH] [--json]

Refreshes managed .scafld/core/ files from the bundled runtime. It also refreshes .scafld/prompts/* copies that are still known defaults, while skipping customized project prompts. It refreshes root agent docs and renders generated .scafld/config.yaml into the current strict runtime shape. Specs, runs, reviews, and local config are preserved.

plan

scafld plan <task-id> [--title TITLE] [--summary TEXT] [--size SIZE] [--risk RISK] [--command CMD] [--json]

Creates .scafld/specs/drafts/<task-id>.md. --command seeds the first executable acceptance criterion. Existing drafts are not overwritten.

harden

scafld harden <task-id> [--provider auto|codex|claude|gemini|command|local] [--json]
scafld harden <task-id> --mark-passed [--json]

Hardening is the pre-build adversarial pass. It attacks the draft before approval: product goal, authority, ownership boundaries, halfway failure repair, hidden cutovers, testable invariants, golden examples, and recovery commands.

Without flags, harden appends a round, sets harden_status: in_progress, and prints the active prompt from .scafld/prompts/harden.md, falling back to .scafld/core/prompts/harden.md and then the built-in prompt.

With --mark-passed, it verifies the latest round's harden checks and Grounded in citations, closes the round, and sets harden_status: passed. Missing checks, checks that did not pass, open approval-blocking issues, and unresolved citations keep the round open. Advisory issues stay recorded but do not block approval.

With --provider, scafld delegates the harden round to a separate read-only provider. The provider must submit one strict HardenDossier through the structured submit channel. A pass verdict closes hardening; needs_revision records checks that did not pass and open approval-blocking issues in the draft for the implementation agent to resolve before approval. Non-blocking advisory issues remain in the harden round as evidence, not as forced rework. Provider transport or invalid dossier problems are recorded as harden_status: error.

Accepted citation shapes are Grounded in: spec_gap:<field>, Grounded in: code:<path>:<line>, and Grounded in: archive:<task-id>. Code citations must use an existing workspace-relative path and a real line number. Line ranges are rejected; cite the single line that anchors the evidence.

Required checks are Path audit, Command audit, Scope/migration audit, Acceptance timing audit, Rollback/repair audit, and Design challenge. Each check must record Result: passed or Result: not_applicable plus evidence. Issues: none is valid only after those checks have evidence. The design challenge is not a style preference: it must challenge why the plan exists, whether it solves the underlying problem, and whether it is a short-sighted bandaid or future bloat.

validate

scafld validate <task-id> [--json]

Parses the Markdown spec into the normalized model and rejects malformed lifecycle state, phase identity, harden state, duplicate criteria, or non-executable acceptance criteria.

approve

scafld approve <task-id> [--json]

Records approval in the session ledger, then moves the draft spec to .scafld/specs/approved/. Approval is explicit operator action; it is not implied by hardening.

build

scafld build <task-id> [--json]

Runs the governed implementation loop. From approved, it activates the task, captures the workspace baseline, opens the first phase, and points the agent at scafld handoff <task-id>. It does not run future acceptance before the phase has been implemented.

From active or blocked, build records evidence for the current phase. If the phase passes, it opens the next phase. If the final phase and global acceptance pass, it moves the task to review. Drafts, terminal specs, and already-ready review specs are rejected.

Acceptance commands inherit the process environment plus execution overrides from .scafld/config.yaml and .scafld/config.local.yaml. Use that config for repo-wide toolchain setup such as rbenv shims instead of relying on interactive shell startup. Acceptance commands default to a 300-second absolute timeout; raise execution.absolute_timeout_seconds for legitimate slow project tools. Set execution.idle_timeout_seconds only when an idle-output watchdog is useful for the project.

Phase acceptance runs in order. If a phase blocks, later phase commands are not run and the next command becomes scafld handoff <task-id> so the repair agent gets the failed criterion, command, and evidence instead of a vague blocked status.

review

scafld review <task-id> [--provider auto|codex|claude|gemini|command|local] [--provider-command CMD] [--provider-binary PATH] [--model MODEL] [--review-scope PATH[,PATH...]] [--print-context] [--human-reviewed --reason TEXT] [--json]

review is the adversarial completion gate. Defaults come from .scafld/config.yaml under review.external. Fresh workspaces use provider: auto, which prefers the other installed agent when the current host is detected, can use Gemini as another external challenger, then falls back if needed. Without a detected host, the default order is codex, then claude, then gemini. If no external provider is available, the command fails and tells the operator to install a provider, use --provider command, or explicitly choose --provider local for development smoke tests. Local verdicts cannot satisfy complete.

Provider modes:

auto: choose an installed external challenger.
codex: run Codex in read-only ephemeral mode with user config and execpolicy rules disabled for the review subprocess.
claude: run Claude with session persistence, slash commands, and browser integration disabled; built-in tools are restricted to Read, Grep, and Glob, and the verdict must be submitted through the submit_review MCP tool.
gemini: run Gemini CLI in plan/read-only mode with a temporary scafld MCP settings file; the verdict must be submitted through the submit_review MCP tool and final text is ignored.
command: run a custom reviewer command; requires --provider-command.
local: deterministic pass-through provider for development and tests only; its verdict cannot satisfy complete.
--human-reviewed: record an audited operator review instead of invoking a provider. --reason is required and is stored in the session ledger.

Provider-specific model defaults come from review.external.codex.model, review.external.claude.model, and review.external.gemini.model. --provider, --provider-command, --provider-binary, and --model override config for one invocation.

--review-depth, --max-findings, and --min-attack-angles override the review dossier budget for one run. For small diffs, keep the same gate but request a cheaper blocker-focused review:

scafld review <task-id> --review-depth light --max-findings 4 --min-attack-angles 3

--print-context prints the exact deterministic review-context packet without invoking a provider. Use it when an agent needs to see why a reviewer is under-informed or why a gate is likely to block before spending a model run.

scafld derives review scope from spec packages, impacted files, and phase changes. Use --review-scope only when a dirty monorepo or workspace needs an explicit path boundary:

scafld review email-contracts --review-scope api
scafld review email-contracts --review-scope api,cli/packages/mcp

The approval baseline is captured before task execution. Review compares the current workspace to that baseline, reports task-scoped changes to the provider, and includes changes outside declared scope as ambient workspace drift. Unchanged baseline dirt and ambient drift are context, not findings by themselves. Task-relevant files changed during review still fail closed; unrelated workspace churn does not discard a valid review.

The provider returns a ReviewDossier. scafld validates it, rejects workspace mutation in the review-relevant surface, writes the review event to session, and projects the verdict back into the spec. A human-reviewed override writes a review_override event before the passing review event. complete will not archive the task unless the review verdict is pass.

On review failure, the text output prints the findings and next repair command. The same findings appear in scafld status, scafld handoff, the session review entry, and the spec ## Review section.

complete

scafld complete <task-id> [--json]

Archives completed work only after the latest session review event has a pass verdict from codex, claude, gemini, command, or an audited human review.

fail

scafld fail <task-id> [--reason TEXT] [--json]

Records the failure in session, then archives the spec.

cancel

scafld cancel <task-id> [--reason TEXT] [--json]

Records the cancellation in session, then archives the spec.

status

scafld status <task-id> [--json]

Shows lifecycle status, the next allowed follow-up command, and latest review findings when present.

list

scafld list [--json]

Lists all known specs by task id, status, and title.

report

scafld report [--json]

Aggregates workspace spec counts and session-derived product metrics:

first_attempt_pass_rate: tasks whose first completed build moved straight to review.
recovery_convergence_rate: blocked first attempts that later recovered to review.
challenge_override_rate: challenged tasks completed without a later passing review from codex, claude, gemini, or command. This should normally stay at 0.
review_pass_rate: accepted review verdicts over all review verdicts.
review_dossier_coverage: review events that stored a valid ReviewDossier.
review_findings_total: findings accepted across all valid dossiers.
review_open_blockers_total: findings that still blocked completion when recorded.
review_attack_angles_total: attack-log entries accepted across dossiers.
workspace_baseline_coverage: sessions with an approval/build baseline.

Human output keeps the same numbers compact:

total specs: 12
by status:
- review: 1
- completed: 9
metrics:
- first_attempt_pass_rate: 66.7% (8/12)
- recovery_convergence_rate: 75.0% (3/4)
- review_pass_rate: 80.0% (8/10)
- review_dossier_coverage: 100.0% (10/10)
- review_findings_total: 14
- review_open_blockers_total: 3
- review_attack_angles_total: 42
- review_mode_distribution:
  - discover: 7
  - verify: 3
- challenge_override_rate: 0.0% (0/2)
- workspace_baseline_coverage: 100.0% (12/12)

handoff

scafld handoff <task-id>

Renders model-facing context from the current spec and session state. Handoffs include failed or pending acceptance criteria while a task is blocked, and latest review findings when present. They are transport, not source of truth.