Architecture¶
This document is a map of how agent6 runs end-to-end. The diagrams
are mermaid (mermaid fenced blocks render natively on GitHub). For
per-file conventions and stability rules see AGENTS.md.
For the security model (threat model, defense layers, sandbox profiles),
see security.md.
Layering¶
Boundaries are enforced by tach (see tach.toml). Workflows never import each other; agents never import workflows or the CLI. Crossing a boundary is almost always a sign of the wrong design.
- cli (src/agent6/cli/): argument parsing,
optional TUI spawn, top-level dispatch. Picks a workflow. Config is
resolved by config_layer.py (built-in
secure defaults < global
~/.config/agent6/config.toml< per-repo config <--config FILE), with paths + sudo/root resolution in paths.py and API keys in secrets.py. Per-repo state (config and run state together) lives out of the workspace under$XDG_STATE_HOME/agent6/<repo-id>/; the base is settable via the global-only[agent6].state_diror theAGENT6_STATE_HOMEenv var. Roles:workerdrivesrun/resume,plannerdrivesplan(falls back toworker),reviewerdrivesreview+ the in-loop critic. - workflows (src/agent6/workflows/): two
exist,
loop(the agent loop drivingagent6 run/agent6 resume) andreview(the read-only review pass drivingagent6 review). - agents (src/agent6/agents/): single-turn
LLM call shapes. The only one is
code_review; the agent loop makes its own provider calls inline. - tools (src/agent6/tools/): the fixed tool surface the LLM sees, plus dispatch.
- sandbox (src/agent6/sandbox/): Landlock
on the agent process,
agent6-jailfor children.
Workflow: run¶
This is the agent. One provider, one model, one message history. The model drives by calling tools; the workflow dispatches tools, snapshots state, and tracks budget.
stateDiagram-v2
[*] --> snapshot
snapshot --> llm_call
llm_call --> dispatch: model emits tool calls
llm_call --> [*]: budget exhausted
dispatch --> snapshot: non-terminal tool
dispatch --> commit: run_verify_command (exit 0)
commit --> snapshot
dispatch --> [*]: finish_run
Notes:
- One LLM, one history, one loop. No planner→worker handoff, no critic step, no separate reviewer agent. Multi-step work is the model calling the next tool in the same conversation.
- Snapshot before every LLM call. A
snapshots/<step>.jsonis written to the run directory (<state-dir>/<repo-id>/runs/<run-id>/, out of the workspace) before each provider request.agent6 resume <run-id>rehydrates from the latest snapshot; combined with the per-tool transcripts undertranscripts/, any interrupted run can be replayed deterministically up to the model call that comes next. - Per-step commits fire when
run_verify_commandreturns 0, viagit_ops.pyfrom outside the jail. Per-step is the default; thegit.commit_strategyknob also allowssquash(one commit at run end),stage(stage but never commit), andnone. - DAG-as-scaffold.
add_task/update_task/set_cursor/list_taskswrite to a curator-owned side store: the worker's task breakdown. They do not pick which tool runs next, but agent6 reads the DAG to keep a small or weak model focused on a long task. Each turn it surfaces the current task -- the cursor when it still points at an open subtask, else the first dependency-satisfied pending subtask -- into the prompt, advances the cursor as tasks pass, and marks the surfaced taskin_progress. It also refusesfinish_runwhile the worker's own subtasks are still open (capped, so a task it cannot close cannot stall the run forever). The surfaced banner survives tier-1 elision and is re-injected after each tier-2 restart, so the worker always sees its current task without it being re-appended every turn. If the focus task holds for many turns with no forward motion (a weak model grinding one task without concluding or decomposing it), a nudge offers to split / pass / skip it -- re-firing periodically up to a small cap (a weak model was seen ignoring a single nudge); any progress resets the counter, so a healthy run never sees it. - Context compaction. Long runs are kept inside the model's context
window in two tiers (thresholds in
[context]): atdrop_at_charsthe oldest tool_results are replaced by a short "re-call if needed" placeholder; atsummarise_at_charsthe elided history is summarised by thereviewermodel and the conversation restarts from (task + summary). The curator-owned task DAG survives the restart: agent6 re-surfaces the current task into the fresh context (above), so the worker resumes the right task instead of starting over. At that tier-2 restart agent6 also asks the summariser which tracked tasks the transcript shows finished and what new work it found, then marks the finished onespassedand queues the new ones in the DAG -- so task state stays accurate even though weak models rarely callupdate_taskthemselves. finish_run(summary)is the only terminal tool. Calling it emits arun.endevent and returns control to the CLI.
Workflow: review¶
A single read-only pass (src/agent6/workflows/review.py)
over a diff (working tree, branch-vs-base, or arbitrary range) using
the agents/code_review.py agent. Produces structured findings; no
edits, no commits, no run_command.
stateDiagram-v2
[*] --> collect_diff
collect_diff --> code_review
code_review --> [*]
Enforcement layering¶
security.md details which guarantee each layer provides. As a diagram:
flowchart TD
LLM[LLM choice of tool] --> Tools[tools/dispatch.py]
Tools -->|apply_edit, apply_patch, read, list, grep, outline| FS[(workspace fs)]
Tools -->|run_verify_command, run_metric_command, run_command| Jail[agent6-jail]
Jail --> NS[user/mount/pid/ipc/uts/net NS]
Jail --> Pivot[pivot_root into minimal rootfs]
Jail --> ROBinds[strict only: RO bind .git]
Jail --> Land[Landlock V1 rules]
Jail --> Sec[seccomp filter]
Jail --> Caps[capset 0 + NO_NEW_PRIVS]
Land -.-> Child[child process]
Sec -.-> Child
ROBinds -.-> Child
Caps -.-> Child
Workflow[workflow git_ops.py] -->|outside jail| Git[(.git)]
Workflow -. blocks .-> Push[push / --force / reset --hard]
git_ops.pyruns outside the jail (the agent's own process), so the RO bind of.gitdoes not stop the workflow from committing. It stops the worker.protect_gitis strict-only. On strict the jail read-only bind-remounts.giton top of the workspace mount. The hardened profile (no mount namespace to carve with) grants blanket read-write on the repo cwd, so.gitis writable by jailed commands there. Carving.gitread-only on hardened would also deny new top-level entries and break toolchains like cargo/pytest that createtarget/or.pytest_cache/. The writable.giton hardened is acceptable: it is gated byrun_commands(defaultask), recoverable (branch-per-run, commits go throughgit_ops), and the surrounding container is the blast radius.- Run state is safe from jailed commands because it lives out of the
workspace (
<state-dir>/<repo-id>/), unreachable from the repo cwd that jailed commands run on.
Curator subprocess¶
The task graph is owned by a separate graph-curator subprocess
(python -m agent6.graph.server). The
main agent process writes the rest of the run state (resume snapshot,
event log, transcripts) in-process.
flowchart LR
Agent[agent6 run<br/>main process] -->|UDS JSON IPC| Curator[graph-curator<br/>subprocess]
Curator -->|task graph| Graph[(graph.jsonl, graph/*.md, graph snapshots)]
Agent -->|in-process| Rest[(loop_state.json, logs.jsonl, transcripts)]
The agent talks to the curator over a Unix domain socket. The curator
validates every IPC frame against a pydantic schema before applying it,
so the on-disk graph stays consistent. What keeps the whole run
directory safe from jailed commands is its location: it lives out of the
workspace (<state-dir>/<repo-id>/), unreachable from the repo cwd that
jailed commands run on.
Run state on disk¶
Each run's directory <state-dir>/<repo-id>/runs/<run-id>/ holds:
graph.jsonl: append-only journal of every task-graph mutation (curator-owned).graph/*.md: one markdown file per task node, rewritten atomically (curator-owned).logs.jsonl: the structured event stream (below), written by the main process.loop_state.json: the latest resume snapshot that drivesagent6 resume, written by the main process before each LLM call and at iteration end.checkpoints/<NNNN>.json: append-only per-turn snapshots (NNNN = zero-paddednext_iteration), each the same payload asloop_state.jsonplus the workspacehead_shaand curatorgraph_versionat that turn.agent6 fork --at-turn Nrolls a run back to turn N by cloning the matching checkpoint into a new run. Kept in full (a run is dozens of turns); written by the main process alongsideloop_state.json.transcripts/: full provider request/response pairs for replay, written by the main process.
A fork (agent6 fork <src>) clones a source run's state, as of a checkpoint,
into a NEW run dir with a new id: it copies the checkpoint as the new run's
loop_state.json + seed checkpoints/0000.json, copies the curator DAG
(graph/, graph.jsonl, cursor.json) verbatim, writes a manifest with
parent_run_id / forked_from_turn / forked_from_sha, and cuts
agent6/<new> at the turn's sha (additive git branch, the operator's
checkout is untouched). The source run is never mutated. One fork edge per line
lands in a per-repo lineage.jsonl at the state-dir root. Past-turn DAG replay
(reconstructing the graph at an older graph_version) is deferred; a fork copies
the source's current DAG.
The logs.jsonl vocabulary is small and stable: the data contract for
any external viewer (the fold to UI state lives in
src/agent6/ui/state.py as a pure function):
| Event | Notable fields |
|---|---|
run.start |
user_task |
tool.call / .result |
name, args (preview), ok, summary — emitted as a pair for EVERY dispatched tool, including ones a guard rejects (ok=false, trusted reason), so no call is unaccounted for. Execution tools (run_command/run_metric_command) also carry capped stdout_tail/stderr_tail like verify.end |
verify.start / .end |
cmd, exit_code, duration_s, *_tail |
loop.verify_inferred |
command (argv, [] if none), source (agents_md/manifest/llm/none) |
role.call / .result |
role, model, tokens_in, tokens_out |
role.text_delta |
streamed assistant text chunk |
role.thinking_delta |
streamed reasoning chunk (TUI "thinking" pane) |
run.steer_requested |
source ("sigint"): mid-run Ctrl-C |
budget.update |
totals + caps for input/output tokens |
approval.prompt/.answer |
id, prompt, approved, source (tui/stdin) |
loop.* |
agent progress: loop.auto_commit, loop.compact.*, loop.critic.*, loop.metric.*, loop.steer.* |
loop.budget |
per-iteration usage heartbeat: iteration, input_tokens, output_tokens, cache_read_tokens, cost_usd (read by agent6 runs show) |
loop.review.* |
adversarial review panel: loop.review.start (trigger, seats), loop.review.seat (seat, model, verdict, findings), loop.review.panel (blocked, raw_blocked, decision, n_block, disarmed), loop.review.skipped |
run.end |
summary |
A run_command approval is published as approval.prompt; the dashboard
TUI shows an Allow/Deny modal and writes approvals/<id>.answer, which the
workflow reads (falling back to a stdin prompt with no TUI), then records
approval.answer. The task DAG is not in this stream; it is
curator-owned and lives in graph.jsonl (read via agent6 runs
graph).
Where things live¶
| Concern | File / dir |
|---|---|
| Config schema | src/agent6/config.py |
| Tool surface | src/agent6/tools/schema.py |
| Tool dispatch | src/agent6/tools/dispatch.py |
| agent loop | src/agent6/workflows/loop.py |
| Review workflow | src/agent6/workflows/review.py |
| Code-review agent | src/agent6/agents/code_review.py |
| Jail launcher (Python wrapper) | src/agent6/sandbox/jail.py |
| Jail launcher (Rust binary) | src/agent6/jail/src/main.rs |
| Git policy | src/agent6/git_ops.py |
| Provider clients | src/agent6/providers/ |
| Knowledge graph (curator) | src/agent6/graph/ |
| Event log + UI fold | src/agent6/events.py, src/agent6/ui/ |
| Run state on disk | <state-dir>/<repo-id>/runs/<run-id>/ (out of the workspace) |
Pre-1.0 stability¶
See AGENTS.md. Until 1.0 every public shape (config TOML, IPC frames, on-disk graph, CLI flags, transcript layout) is liquid; we break cleanly rather than carry shims.