Execution — scafld

Execution is deliberately less clever than the spec. It runs explicit commands, records evidence, and projects current state back into the Markdown file.

draft -> harden -> approve -> build -> review -> complete

Build

scafld build <task-id>

build is the governed implementation loop. From approved, it activates the task, captures the workspace baseline, opens the first phase, and prints the allowed handoff. It does not run acceptance for files that do not exist yet.

After the agent implements the opened phase, run scafld build <task-id> again. That second call records evidence for the current phase, then either blocks, opens the next phase, or moves the task to review after final acceptance.

Acceptance commands run in a non-login shell. scafld does not read interactive shell startup files. Instead it builds the command environment from checked-in project evidence, then overlays .scafld/config.yaml and .scafld/config.local.yaml.

When repo toolchain files declare language versions, scafld automatically prepends common version-manager shims. .tool-versions enables asdf/mise shims, mise.toml enables mise shims, and language-specific files such as .ruby-version, .python-version, .node-version, .go-version, and .java-version enable the matching common shim directory. Declare project-specific overrides in config:

Checked-in file	Auto-prepended shims
`.tool-versions`	`$HOME/.asdf/shims`, `$HOME/.local/share/mise/shims`, `$HOME/.mise/shims`
`mise.toml`, `.mise.toml`	`$HOME/.local/share/mise/shims`, `$HOME/.mise/shims`
`.ruby-version`	`$HOME/.rbenv/shims`
`.python-version`	`$HOME/.pyenv/shims`
`.node-version`, `.nvmrc`	`$HOME/.nodenv/shims`
`.go-version`	`$HOME/.goenv/shims`
`.java-version`	`$HOME/.jenv/shims`

execution:
  absolute_timeout_seconds: 300
  idle_timeout_seconds: 0
  path_prepend:
    - "$HOME/.rbenv/shims"
    - "$HOME/.rbenv/bin"
  env:
    BUNDLE_GEMFILE: "api/Gemfile"

This makes validation deterministic. If a command needs rbenv, asdf, pnpm, or another shim, the dependency is visible in the workspace contract instead of hidden in a developer's interactive shell startup. Explicit config paths are placed before auto-detected shims.

Each acceptance command has an absolute timeout. The default is 300 seconds. Raise execution.absolute_timeout_seconds for legitimate slow project tools such as cold typechecks. execution.idle_timeout_seconds defaults to 0 and only applies when explicitly set.

Approval captures the workspace baseline before task execution starts. Review uses that baseline later to separate task changes from unrelated pre-existing dirty state.

Build is phase-sequenced. scafld opens exactly one phase at a time. A phase is active while the agent is implementing it, completed only after its acceptance evidence passes, and blocked only after attempted evidence fails. Later phases do not get evidence while an earlier phase is active or blocked.

The normal agent loop is:

scafld build <task-id>   # open current phase
scafld handoff <task-id> # read what to implement
# implement the phase
scafld build <task-id>   # record phase evidence and advance

If all criteria pass, the task moves to review and the allowed follow-up is:

scafld review <task-id>

If attempted evidence fails or cannot be evaluated, the task becomes blocked; use scafld handoff <task-id> to get the failed criteria, commands, and reasons for the repair agent. Pending future phase criteria are not blockers.

Evidence Flow

For each criterion, scafld records:

command
exit code
matcher result
short output snippet
criterion id
phase id

The session ledger is written before the spec projection. That ordering is what lets scafld rebuild projected state from evidence if a write fails halfway.

Browser Evidence

Frontend work can use a typed browser criterion:

Acceptance:
- [ ] `ui-smoke` browser - Dashboard renders without browser errors
  - Command: `pnpm run scafld:browser -- dashboard`
  - Expected kind: `browser_evidence`

scafld does not own the browser runner, dev server, or authentication flow. The criterion command owns those project-specific details and writes one JSON evidence object to stdout. A project script can wrap Playwright, Cypress, or a browserless check and translate its result into this shape:

{
  "url": "http://localhost:3000/dashboard",
  "viewport": "1440x900",
  "auth": { "mode": "storage_state", "artifact": ".auth/user.json" },
  "screenshots": [{ "path": ".scafld/runs/task/dashboard.png" }],
  "console_errors": [],
  "network_errors": []
}

The command must exit with code 0, include url, viewport, console_errors, network_errors, and at least one screenshot, trace, video, or artifact path. Any recorded console or network error fails the criterion.

Auth stays project-owned. Use Playwright storage state, a fixture login command, or a test-only account according to the project's rules, and record only the redacted mode/artifact in evidence. Do not write tokens, passwords, cookies, or headers into the evidence JSON.

If a Playwright-shaped browser criterion exits before producing evidence because Playwright or its browser binaries are missing, scafld adds install guidance to the criterion failure reason. The fix remains project-owned: install dependencies, install browser binaries with the package manager used by the repo, then rerun the same criterion command.

Handoffs

scafld handoff <task-id>

The handoff is a compact model-facing summary of title, status, allowed next command, blocked acceptance evidence, and latest review findings when present. Handoff is transport only; it is never read back to compute state.

Process Safety

The process runner supports command stdin, timeout, idle timeout, process-group termination, and diagnostic capture. Long-running provider review uses the same runner surface as acceptance execution.