Operations

/cmux-agents

Spawn AI workers in cmux via MCP/repoGolem. Triggers: visible workers, terminal agents, orchestration, audits.

$ golems-cli skills install cmux-agents

Good

100% best pass rate

196 assertions

43 evals

1 workflow

Updated 5 days ago

Orchestration layer for AI agents in cmux panes. Low-level pane operations (splits, reads, sends) use cmux MCP tools — this skill handles the workflow on top.

Lead/Orchestrator Monitor Law

Monitor/cron/loop-payload rules → see cron-payload-discipline (canonical). cmux-agents owns the worker-engine lifecycle below; do not duplicate the canonical monitor law here.

ROLE GEOMETRY — leads/orchestrators land LEFT, workers land RIGHT (automatic)

Why this exists (2026-06-14): the cmux skill/MCP setup kept confusing agents about geometry — who goes where, and how to avoid hand-placing panes. The cmuxlayer layout policy (PR #156) now enforces a two-column invariant by ROLE, so you should stop hand-managing left/right and just pass the role.

Leads / orchestrators → LEFT column. Role orchestrator. Default inferred from a *Claude launcher title.
One lead pane per workspace. Additional leads, successor leads, and promoted coordinators must become tabs in the existing left orchestrator pane unless the user explicitly asks for a separate lead pane/workspace. A managed successor lead that creates a new left split or third column is successor_lead_wrong_pane / topology_three_or_more_columns; immediately move it into the existing lead pane, verify column_count:2, and keep the failure in the collab/report.
Workers (Codex/Cursor implement+gather) → RIGHT column. Role worker. Default inferred from *Codex / *Cursor titles.
Role is semantic, not decorative. Use role:"orchestrator" only for a lead/orc seat that coordinates, routes, verifies, and reports. Use role:"worker" for implementation, gathering, audit, or phase execution seats. A Codex lead can be an orchestrator; ordinary Codex/Cursor phase agents are workers.
Codex orchestrator is valid when managed. non_claude_orchestrator must not be treated as a health failure by itself. A Codex lead is production-valid when it has managed agent_id, role:"orchestrator", left-column placement, resumability/session health or explicit non-resumable health, and a Codex adapter guard (inbox:live, owned wait_for(done), or watcher:<pid/task-id>). The failure is an unmanaged/unmonitored Codex orchestrator, not "Codex is lead."
Promoted leads must be relaunched or registered as leads. Renaming a worker/right-pane surface to "LEAD" does not make it a lead. Any promoted lead must have a managed agent_id, role:"orchestrator", and left-column placement. If it lacks any of those, mark lead_unmanaged / role_topology_mismatch and recover/register or replace it with a managed lead before production work continues.
Default workspace is the parent agent workspace. If the caller does not explicitly pass workspace, managed spawn_agent should place the new worker in the parent/caller agent's workspace, not the human's currently focused workspace. Passing no workspace must not create or select a surprise workspace. If cmuxlayer must focus the parent workspace to make a terminal initialize, it must focus the parent workspace, create the pane, wait until the pane/surface is initialized and available, then restore the user's prior focus. If cmuxlayer violates this, move the surface back, resync, and report a health failure.
Pass the role, don't compute the column. spawn_agent, new_split({role}), and spawn_in_workspace place by role automatically. Once two columns exist, additional workers dock as TABS in the rightmost worker pane — never a third column.
Topology drift is a health failure. Unexpected three-column layouts, a worker in the lead column, an orchestrator that creates a surprising split, or a successor lead tab that lands outside the existing lead pane must be called out in the collab/status report and routed to cmuxlayer. Do not summarize the fleet as "fine" when geometry is visibly wrong.
spawn_in_workspace is the clean team stand-up: one call creates the workspace and lays out commanders-LEFT / workers-RIGHT atomically. Prefer it over a sequence of spawn_agent calls when standing up a multi-agent team.
Auto-focus is handled for you (PR #156). Workspace-targeted splits focus the target before splitting and restore your focus after render. You no longer hand-run focus-pane around every split — but focus:true is still REQUIRED on role-based new_split (role placement rejects focus:false).
Engine routing rules → see agent-routing (canonical). cmux-agents only maps the chosen role to pane placement.

Default Lifecycle (2026-04-19)

Visible worker peers now use the agent-based cmux MCP tools by default:

mcp__cmux__spawn_agent({repo, cli, prompt, workspace?, parent_agent_id?})
Capture the returned agent_id immediately and write it into your collab / AGENT_REGISTRY.
mcp__cmux__wait_for({agent_id, target_state, timeout_ms}) for lifecycle gates. For autonomous file-backed workers, pass report_path and done_marker with target_state:"done" so cmux can promote terminal report evidence instead of hanging on stale registry state.
mcp__cmux__send_to({agent_id, text, press_enter}) (or dispatch_to_agent) for follow-ups.
mcp__cmux__get_agent_state({agent_id}) for a combined registry + parsed-screen snapshot.
mcp__cmux__inbox_check({agent_id}) for subscription/inbox health.
mcp__cmux__list_agents(...) or mcp__cmux__my_agents(...) whenever topology changes.
mcp__cmux__stop_agent({agent_id}) to stop or recycle a worker.

Reference agents by agent_id, not surface index. Surface IDs drift after crashes, respawns, and pane reuse; agent_id is the durable handle. Surface refs still matter for raw inspection (read_screen) and non-agent panes (new_surface, browser tabs), but not for the default worker lifecycle.

Existing Agent Reuse / Goal Supersession

Before spawning, run list_agents / my_agents when the user references an existing pane/agent or when the same repo/workspace/role lane may already exist.

Reuse the existing managed agent unless it is dead, unhealthy, or the user explicitly asks for a replacement.
Do not spawn a duplicate just to get a cleaner prompt.
If the current agent has a stale or too-narrow mission, supersede it with one explicit file-backed goal follow-up.
Preserve the user's full delegation intent in the goal file. Do not shrink a broad mission to the next local blocker unless the user asks for a narrow task.

Supersession pattern:

1. Write /abs/path/goal.md with full mission, constraints, success criteria, report path, DONE marker, and green/no-green decision.
2. Deliver the goal using the receiver's adapter syntax, not a universal `/goal` assumption:
   - Codex: `/goal Read and execute this goal file until complete: /abs/path/goal.md` only after verifying this Codex harness accepts `/goal`; otherwise use a plain file-contract message.
   - Cursor: use the current Cursor agent goal command only when verified; if it renders a duplicated footer/status prompt, verify from screen/file progress before sending again.
   - Gemini/Antigravity: plain file-contract instruction, e.g. `Read and execute this goal file until complete: /abs/path/goal.md`; do not prefix `/goal` unless that harness version explicitly documents it.
   - Claude: native goal/monitor flow when available.
3. Read the pane within 15 seconds and verify the harness accepted the
   supersession according to that harness's adapter contract.
4. Record goal_file, report path, DONE marker, delivery path, supersession
   evidence, and any command-state caveat in the collab Agent row.
5. Monitor the report file/DONE marker first; use pane reads only for delivery failure or health disputes.

If cmuxlayer exposes supersede_agent_goal, use it instead of a raw follow-up so registry task_summary and goal_file stay truthful.

Harness Command Semantics Map

Do not turn every live TUI surprise into a bespoke rule. For command-driven agent control, maintain an adapter-level command semantics map and derive specific rules from it. The map should cover Codex, Claude, Cursor, cmuxlayer, and repoGolem launchers for:

goal replace/resume and confirmation menus;
interrupt/pause/resume behavior;
monitor, watcher, inbox, and completion wake paths;
subagent/worker spawn and managed agent_id registration;
worker close/archive semantics;
MCP reconnect/restart and successor handoff;
manual launcher fallback and prompt submission verification.

When a command behaves unexpectedly, record the transcript as evidence, update the adapter map or an eval, and only add a local workaround when it is needed to unblock the current lane. Example evidence: Codex /goal replacement may stop at a confirmation menu or leave the old objective paused after interruption; the adapter rule is "verify the command reached the intended state," not "assume /goal sends are atomic."

Managed Health Gate

After every managed spawn, record a registry row with:

| Claimed Name | Agent ID | Surface | Workspace | Role | cli_session_id | Resumable | Inbox | Topology | Status |

Treat these as HEALTH FAILURE conditions, not warnings:

agent_id missing, unknown, or auto-* discovered instead of a managed id.
Lead/orchestrator pane has no managed agent_id, is an orphan/raw surface, or was produced by an interrupted spawn.
cli_session_id:null / non-resumable on a long-running or crash_recover:true agent.
inbox_check.monitor_alive:false, stale pending messages, or dispatch requiring a fallback nudge.
get_agent_state says ready/idle while the raw pane is working, wedged, or has unsubmitted text.
A boot prompt or task prompt is visibly typed in the agent composer but was not submitted. This is boot_prompt_typed_not_submitted; send Return only as an explicit recovery action, then record the delivery failure even if the worker later succeeds.
Role/topology mismatch: worker in lead column, lead placed as worker, unexpected third column, surprise workspace creation, or unexplained split drift.
Registry/surface mismatch: get_agent_state.workspace_id or surface route disagrees with list_surfaces after a move/resync.
Repo/cwd ambiguity: a pane running from a generic directory (for example ~/Gits) gets bad auto labels or repo ownership. Treat this as health evidence and relaunch from the intended repo through managed spawn.

If any fail, append the failure to the collab before continuing. Do not hide it behind a normal status line.

Lead Goal Contract

A lead/orchestrator goal is not a worker checklist. Its goal file must say:

Spawn/delegate Cursor gatherers, Codex implementers, or cmux workers as needed.
Maintain the collab Agent table and health gates.
Synthesize, verify, and decide; do not personally grind the whole lane except tiny glue or urgent unblockers.
Close workers after their reports/DONE markers or blocked handoffs are captured, unless the Agent row records KEPT_OPEN:<reason> with owner and next check for active review, successor orientation, live process state, or paired continuation. Unexplained completed panes are done_worker_left_open.

If a supposed lead receives a worker-style /goal that makes it execute the full lane directly, rewrite/supersede the goal before it starts burning lead context.

Full SKILL.md source — includes LLM directives, anti-patterns, and technical instructions stripped from the Overview tab.

Orchestration layer for AI agents in cmux panes. Low-level pane operations (splits, reads, sends) use cmux MCP tools — this skill handles the workflow on top.

Lead/Orchestrator Monitor Law

Monitor/cron/loop-payload rules → see cron-payload-discipline (canonical). cmux-agents owns the worker-engine lifecycle below; do not duplicate the canonical monitor law here.

ROLE GEOMETRY — leads/orchestrators land LEFT, workers land RIGHT (automatic)

Why this exists (2026-06-14): the cmux skill/MCP setup kept confusing agents about geometry — who goes where, and how to avoid hand-placing panes. The cmuxlayer layout policy (PR #156) now enforces a two-column invariant by ROLE, so you should stop hand-managing left/right and just pass the role.

Leads / orchestrators → LEFT column. Role orchestrator. Default inferred from a *Claude launcher title.
One lead pane per workspace. Additional leads, successor leads, and promoted coordinators must become tabs in the existing left orchestrator pane unless the user explicitly asks for a separate lead pane/workspace. A managed successor lead that creates a new left split or third column is successor_lead_wrong_pane / topology_three_or_more_columns; immediately move it into the existing lead pane, verify column_count:2, and keep the failure in the collab/report.
Workers (Codex/Cursor implement+gather) → RIGHT column. Role worker. Default inferred from *Codex / *Cursor titles.
Role is semantic, not decorative. Use role:"orchestrator" only for a lead/orc seat that coordinates, routes, verifies, and reports. Use role:"worker" for implementation, gathering, audit, or phase execution seats. A Codex lead can be an orchestrator; ordinary Codex/Cursor phase agents are workers.
Codex orchestrator is valid when managed. non_claude_orchestrator must not be treated as a health failure by itself. A Codex lead is production-valid when it has managed agent_id, role:"orchestrator", left-column placement, resumability/session health or explicit non-resumable health, and a Codex adapter guard (inbox:live, owned wait_for(done), or watcher:<pid/task-id>). The failure is an unmanaged/unmonitored Codex orchestrator, not "Codex is lead."
Promoted leads must be relaunched or registered as leads. Renaming a worker/right-pane surface to "LEAD" does not make it a lead. Any promoted lead must have a managed agent_id, role:"orchestrator", and left-column placement. If it lacks any of those, mark lead_unmanaged / role_topology_mismatch and recover/register or replace it with a managed lead before production work continues.
Default workspace is the parent agent workspace. If the caller does not explicitly pass workspace, managed spawn_agent should place the new worker in the parent/caller agent's workspace, not the human's currently focused workspace. Passing no workspace must not create or select a surprise workspace. If cmuxlayer must focus the parent workspace to make a terminal initialize, it must focus the parent workspace, create the pane, wait until the pane/surface is initialized and available, then restore the user's prior focus. If cmuxlayer violates this, move the surface back, resync, and report a health failure.
Pass the role, don't compute the column. spawn_agent, new_split({role}), and spawn_in_workspace place by role automatically. Once two columns exist, additional workers dock as TABS in the rightmost worker pane — never a third column.
Topology drift is a health failure. Unexpected three-column layouts, a worker in the lead column, an orchestrator that creates a surprising split, or a successor lead tab that lands outside the existing lead pane must be called out in the collab/status report and routed to cmuxlayer. Do not summarize the fleet as "fine" when geometry is visibly wrong.
spawn_in_workspace is the clean team stand-up: one call creates the workspace and lays out commanders-LEFT / workers-RIGHT atomically. Prefer it over a sequence of spawn_agent calls when standing up a multi-agent team.
Auto-focus is handled for you (PR #156). Workspace-targeted splits focus the target before splitting and restore your focus after render. You no longer hand-run focus-pane around every split — but focus:true is still REQUIRED on role-based new_split (role placement rejects focus:false).
Engine routing rules → see agent-routing (canonical). cmux-agents only maps the chosen role to pane placement.

Default Lifecycle (2026-04-19)

Visible worker peers now use the agent-based cmux MCP tools by default:

mcp__cmux__spawn_agent({repo, cli, prompt, workspace?, parent_agent_id?})
Capture the returned agent_id immediately and write it into your collab / AGENT_REGISTRY.
mcp__cmux__wait_for({agent_id, target_state, timeout_ms}) for lifecycle gates. For autonomous file-backed workers, pass report_path and done_marker with target_state:"done" so cmux can promote terminal report evidence instead of hanging on stale registry state.
mcp__cmux__send_to({agent_id, text, press_enter}) (or dispatch_to_agent) for follow-ups.
mcp__cmux__get_agent_state({agent_id}) for a combined registry + parsed-screen snapshot.
mcp__cmux__inbox_check({agent_id}) for subscription/inbox health.
mcp__cmux__list_agents(...) or mcp__cmux__my_agents(...) whenever topology changes.
mcp__cmux__stop_agent({agent_id}) to stop or recycle a worker.

Existing Agent Reuse / Goal Supersession

Before spawning, run list_agents / my_agents when the user references an existing pane/agent or when the same repo/workspace/role lane may already exist.

Reuse the existing managed agent unless it is dead, unhealthy, or the user explicitly asks for a replacement.
Do not spawn a duplicate just to get a cleaner prompt.
If the current agent has a stale or too-narrow mission, supersede it with one explicit file-backed goal follow-up.
Preserve the user's full delegation intent in the goal file. Do not shrink a broad mission to the next local blocker unless the user asks for a narrow task.

Supersession pattern:

1. Write /abs/path/goal.md with full mission, constraints, success criteria, report path, DONE marker, and green/no-green decision.
2. Deliver the goal using the receiver's adapter syntax, not a universal `/goal` assumption:
   - Codex: `/goal Read and execute this goal file until complete: /abs/path/goal.md` only after verifying this Codex harness accepts `/goal`; otherwise use a plain file-contract message.
   - Cursor: use the current Cursor agent goal command only when verified; if it renders a duplicated footer/status prompt, verify from screen/file progress before sending again.
   - Gemini/Antigravity: plain file-contract instruction, e.g. `Read and execute this goal file until complete: /abs/path/goal.md`; do not prefix `/goal` unless that harness version explicitly documents it.
   - Claude: native goal/monitor flow when available.
3. Read the pane within 15 seconds and verify the harness accepted the
   supersession according to that harness's adapter contract.
4. Record goal_file, report path, DONE marker, delivery path, supersession
   evidence, and any command-state caveat in the collab Agent row.
5. Monitor the report file/DONE marker first; use pane reads only for delivery failure or health disputes.

If cmuxlayer exposes supersede_agent_goal, use it instead of a raw follow-up so registry task_summary and goal_file stay truthful.

Harness Command Semantics Map

goal replace/resume and confirmation menus;
interrupt/pause/resume behavior;
monitor, watcher, inbox, and completion wake paths;
subagent/worker spawn and managed agent_id registration;
worker close/archive semantics;
MCP reconnect/restart and successor handoff;
manual launcher fallback and prompt submission verification.

Managed Health Gate

After every managed spawn, record a registry row with:

| Claimed Name | Agent ID | Surface | Workspace | Role | cli_session_id | Resumable | Inbox | Topology | Status |

Treat these as HEALTH FAILURE conditions, not warnings:

agent_id missing, unknown, or auto-* discovered instead of a managed id.
Lead/orchestrator pane has no managed agent_id, is an orphan/raw surface, or was produced by an interrupted spawn.
cli_session_id:null / non-resumable on a long-running or crash_recover:true agent.
inbox_check.monitor_alive:false, stale pending messages, or dispatch requiring a fallback nudge.
get_agent_state says ready/idle while the raw pane is working, wedged, or has unsubmitted text.
A boot prompt or task prompt is visibly typed in the agent composer but was not submitted. This is boot_prompt_typed_not_submitted; send Return only as an explicit recovery action, then record the delivery failure even if the worker later succeeds.
Role/topology mismatch: worker in lead column, lead placed as worker, unexpected third column, surprise workspace creation, or unexplained split drift.
Registry/surface mismatch: get_agent_state.workspace_id or surface route disagrees with list_surfaces after a move/resync.
Repo/cwd ambiguity: a pane running from a generic directory (for example ~/Gits) gets bad auto labels or repo ownership. Treat this as health evidence and relaunch from the intended repo through managed spawn.

If any fail, append the failure to the collab before continuing. Do not hide it behind a normal status line.

Lead Goal Contract

A lead/orchestrator goal is not a worker checklist. Its goal file must say:

Spawn/delegate Cursor gatherers, Codex implementers, or cmux workers as needed.
Maintain the collab Agent table and health gates.
Synthesize, verify, and decide; do not personally grind the whole lane except tiny glue or urgent unblockers.
Close workers after their reports/DONE markers or blocked handoffs are captured, unless the Agent row records KEPT_OPEN:<reason> with owner and next check for active review, successor orientation, live process state, or paired continuation. Unexplained completed panes are done_worker_left_open.

If a supposed lead receives a worker-style /goal that makes it execute the full lane directly, rewrite/supersede the goal before it starts burning lead context.

repoGolem Launcher Contract

Default path is still spawn_agent. When FR-01, non-agent terminals, or recovery flows force a manual launcher, keep this file as a pointer only.

Launcher/-s/-m rules → see repogolem (canonical; generated from ralph at/launchers.at). Engine routing rules → see agent-routing (canonical).

Worker vs Audit Placement

Intent	Placement	Why
Worker: code, implementation, collab	Split in current workspace	Visible beside the lead session; easy to monitor and nudge
Audit/research: read-only analysis, perspective, discovery	Separate named workspace	Keeps the working view uncluttered and easy to find in the sidebar

Name audit/research workspaces clearly, e.g. Audit: golems or Research: auth. Use same-workspace down splits only when the user explicitly asks to keep the audit visible next to the lead.

Current Caveats

FR-01 — hyphenated repo names can mis-resolve to the wrong launcher

Known live failure: spawn_agent({repo:"skill-creator"}) can guess skill-creatorClaude instead of the real repoGolem launcher skillcreatorClaude. The immediate workaround lives in /repogolem; verify the launcher there before relying on spawn_agent for a new hyphenated repo.

Hard regression example from 2026-06-27: a skill-creator Codex worker launch resolved to skill-creatorCodex -s -m gpt-5.5. Both parts are wrong:

launcher names strip hyphens / use the registry launcher alias, so this lane must be skillcreatorCodex -s (or the verified repoGolem alias), not skill-creatorCodex;
visible pane agent sessions must not pass -m / --model; the bare launcher owns the model policy.

If managed cmux spawn leaks either bug, do not adopt the pane as production. Close the failed shell pane, record the cmuxlayer health regression, and use the verified bare repoGolem launcher manually only as a temporary fallback.

FR-06 — state parser ambiguity (`booting` vs idle-with-spinner)

wait_for and get_agent_state depend on cmux's parser. In rare cases the registry still says booting while read_screen shows a perfectly usable prompt. Codex ready prompts can appear visible near the top or middle of the pane after boot, not only bottom-aligned on the last prompt line. If wait_for or boot-prompt delivery times out but the raw pane clearly shows the Codex › prompt anywhere visible, treat that as parser drift: record cmuxlayer health evidence, try managed delivery through the agent_id, and use one surface-level fallback send only if you must unblock delivery. A timed-out boot parser must not leave a raw/orphan production surface; recover/register the managed agent or replace/clean it up.

First-run Touch ID note

First launch of a newly signed launcher binary can still trigger a one-time Touch ID prompt. Treat this as a first-run caveat, not a primary orchestration strategy.

Completion Signals — file > wait_for > get_agent_state > read_screen

Why this exists: surface-based polling burned hundreds of read_screen calls and still missed real state transitions. The new default is event-driven waits on stable agent_ids, with raw screen reads reserved for ambiguity or output extraction.

Ranked reliability of completion signals (use the highest that fits the job):

Signal	Reliability	When to use
1. Output file with DONE marker	Ground truth. File either exists + contains marker, or not.	Every multi-minute autonomous worker task.
2. `wait_for({agent_id, target_state:"done", report_path, done_marker})`	Default event-driven lifecycle gate for file-backed workers.	Standard worker completion when the goal contract defines a report path and exact DONE marker.
3. `get_agent_state({agent_id})`	Good current snapshot of registry state + parsed screen.	Quick checks without a full raw read.
4. `read_agent_output` / `read_screen`	Best adjudicator when parser and pane disagree.	FR-06, output extraction, or manual troubleshooting.
5. `list_agents` / `my_agents`	Discovery only.	"What is alive?" not "is this task complete?"

Visual Proof / Screenshots

When Etan asks to see something, deliver a Computer Use screenshot. read_screen is text inspection, not a screenshot. After interactive probes (typing into a pane, reconnecting MCPs, choosing a model/menu option), screenshot proactively when the result is visual or user-facing.

Before pressing Enter in any TUI menu, verify the highlighted row first. Use a Computer Use screenshot when the user needs to see the state; use read_screen only when the terminal text clearly exposes the selected row. If the selection is ambiguous, stop and inspect instead of pressing Enter blindly.

UI work: the RUNNING APP on Etan's screen is the primary deliverable (2026-06-07). Delivery means the app itself is LAUNCHED and visible on his screen, verified by a screenshot Read, with the exact relaunch command posted. open -a Helium <file> without verification is NOT delivery — live catch: "doesn't open, probably Helium, nope nothing." HTML dashboards are a secondary surface for UI work, never the primary deliverable.

The file-based completion pattern (copy this)

Every autonomous worker task must end with a file write and a DONE marker:

# In the worker's prompt:
Write your report to /path/to/output/batch-WORKER.md. The last line of the
file must be exactly: DONE_WORKER_NAME

Orchestrator side, poll the file(s) until all expected outputs exist and each contains its DONE marker:

# run_in_background: true
until [ -f "$PLAN/batches/batch-M1.md" ] && grep -q DONE_MINER_M1 "$PLAN/batches/batch-M1.md" 2>/dev/null \
      && [ -f "$PLAN/batches/batch-M2.md" ] && grep -q DONE_MINER_M2 "$PLAN/batches/batch-M2.md" 2>/dev/null \
      ; do
  sleep 30
done
echo ALL_MINERS_DONE

When the background command completes, you know every miner finished AND wrote a real file (not a partial crash). This survives cmux state-sync bugs, splash-screen false-idles, and pane freezes.

Completion notification / lead wake requirement

The DONE marker is completion ground truth, but it is not by itself a wake path. When a worker reaches TASK_DONE, writes the contracted report DONE marker, or marks a /goal complete, the owning lead must be notified through one of:

wait_for({agent_id, target_state:"done", report_path, done_marker}) returning and waking the caller;
live inbox or adapter subscription delivery to the lead;
explicit cmux notify/workspace notification tied to the worker and report;
the temporary watcher detecting the DONE marker within its cadence and nudging the lead.

Do not require every worker to post into a collab file. The normal state path is goal file -> report file -> exact DONE marker. The normal event path is the active adapter guard (inbox, wait_for, native monitor, notify, or watcher) waking the owning lead with the report path and DONE marker. A collab post is useful history, not the only trigger.

If the worker is visibly TASK_DONE / Goal achieved but the lead is not notified, record completion_notification_missing as cmux health failure alongside missing cli_session_id, non-resumability, dead inbox monitor, or registry/screen disagreement. Do not present the lane as operationally healthy until the lead has harvested the report, acknowledged the DONE marker, and either closed/archived the worker or recorded why it remains open.

For Codex leads, wait_for(done) without a live event path is not enough to go AFK. Codex has no native async Monitor. Before the lead delegates and returns to other work, it must have either live inbox/wait coverage or a recorded background watcher/loop that watches the registered collab/goal/report/DONE paths and dispatches an inbox/notify event to the owning lead. If no watcher PID/task id, wait receipt, or live inbox is recorded, the lane is blocked:health even if the worker is busy.

FR-06 fallback when parser and screen disagree

If wait_for times out or send_to_agent rejects with current state: booting, do this in order:

get_agent_state({agent_id})
read_screen(surface: "...", lines: 20) on the linked surface
If the raw pane shows a prompt, trust the pane and file a cmuxlayer bug
If work is blocked, use one surface-level send_input + send_key("return") fallback, then return to the agent_id flow as soon as the registry catches up

The rule is not "surface sends are normal again." The rule is: agent_id is the source of truth; raw pane sends are an escape hatch for parser drift.

STOP — Before Anything Else

1. Spawn workers with mcp__cmux__spawn_agent. No hand-rolled repoGolem typing unless FR-01 forces a temporary launcher fallback.

2. Store the agent_id immediately. Every follow-up, wait, stop, and handoff should key off the durable id.

3. Re-discover live workers with list_agents / my_agents after crashes or layout changes. Do not trust a remembered surface number.

4. AGENT_REGISTRY — maintain after CLAUDE_COUNTER in every response with active agents:

AGENT_REGISTRY:
| Agent ID | Surface | Repo | Task | Status | Last Check |
|----------|---------|------|------|--------|------------|
| agent:abc123 | surface:153 | golems | Digest failures | WORKING | 12:35 |

Add on spawn. Update on check. Remove on kill.

MCP Primitives (use these for low-level ops)

Operation	MCP Tool	Notes
Spawn worker	`mcp__cmux__spawn_agent`	Default for visible Claude/Codex/Cursor/Gemini/Kiro peers
Send follow-up	`mcp__cmux__send_to_agent`	Replaces `send_input` + `send_key` for normal agent messaging
Wait for state	`mcp__cmux__wait_for`	Replaces client-side poll loops
Discover workers	`mcp__cmux__list_agents` / `mcp__cmux__my_agents`	Use after topology changes; `agent_id` survives surface drift
Inspect state	`mcp__cmux__get_agent_state`	Registry + parsed screen snapshot
Stop worker	`mcp__cmux__stop_agent`	Replaces closing a pane to kill a worker
Read raw pane	`mcp__cmux__read_screen`	Fallback for FR-06 or output extraction
Extract marker output	`mcp__cmux__read_agent_output`	Preferred when you set `---OUTPUT_START---` / `---OUTPUT_END---`
Create non-agent tab	`mcp__cmux__new_surface`	Browsers, shell scratch pads, log tails
Rename / status / progress	`mcp__cmux__rename_tab`, `mcp__cmux__set_status`, `mcp__cmux__set_progress`	UI polish for the visible pane
Open browser	`mcp__cmux__browser_surface`	Non-terminal surfaces
Surface-level escape hatch	`send_input`, `send_key`	Only when parser drift blocks the agent channel

Use spawn_agent for full worker lifecycle. Keep raw surface tools for non-agent panes and FR-06 recovery.

Response Schemas (PR #76, 2026-04-17 — `list_surfaces`)

Why this shipped: An earlier mining sweep of 10 orcClaude sessions found list_surfaces burned 117,000 tokens across 49 calls (avg 2,387 / max 8,333 per call). Root causes: a duplicate bug (surfaces in workspace:N appeared N times in surfaces[]) and per-entry bloat (screen_preview 51%, UUIDs 15%, full remote blob 11% even for local-only workspaces). PR #76 fixed both: deduped surfaces[] unconditionally, and condensed the default response with opt-in backward compatibility via verbose: true. Default payload is ~89% smaller than the pre-PR shape.

Default response (no verbose):

{
  "ok": true,
  "workspaces": [
    {
      "ref": "workspace:1",
      "title": "🎯 orcClaude",
      "current_directory": "/Users/etanheyman/Gits/brainlayer",
      "remote_state": "local"
    }
  ],
  "surfaces": [
    {
      "ref": "surface:35",
      "title": "orcClaude",
      "type": "terminal",
      "workspace_ref": "workspace:1"
    }
  ]
}

remote_state values:

"local" — no SSH/proxy/daemon hints; ordinary macOS workspace. The common case.
"connected" — SSH session is up and connected.
"disconnected" — remote configured but currently offline.
"unavailable" — remote partially configured but daemon/proxy state is missing.

Pass verbose: true to restore the full historical schema — workspace id (UUID), index, pinned, selected, listening_ports, full remote blob (daemon / proxy / heartbeat / ports), plus every per-surface field (id, pane_id, index_in_pane, selected_in_pane, focused, window_ref, pane_ref). Dedup is still applied in verbose mode — it's a correctness fix, not opt-in.

When to pass verbose: true:

Debugging SSH workspace state (need daemon / proxy / heartbeat detail).
Port-forwarding workflows (need listening_ports / forwarded_ports).
UI layout reasoning that needs focused / selected_in_pane / index.
Migration testing against code that still reads old fields.

When the default is enough (99% of agent work):

Routing: "which surface is worker X on?" → need ref + title + workspace_ref.
Layout checks: "are there 2 panes in this workspace?" → need ref count.
Identifying a target for send_input / read_screen → only need surface:N ref.
Status reports: "list all active workers" → ref + title suffice.

Pre-PR gotcha now fixed: workspace_ref is no longer echoed at the top level of the response unless you explicitly passed workspace: "workspace:N" as a filter. Callers relying on the top-level echo must either pass a filter or read it per-surface.

spawn_agent crash_recover + session auto-capture (PR #77, 2026-04-17)

Why this shipped: When a worker's PTY died unexpectedly (shell exit, mac crash, cmux daemon restart), the agent record was orphaned with no way to resume. PR #77 adds an opt-in recovery loop driven by boot-time session-ID capture.

New crash_recover: boolean param on spawn_agent (default false). When true:

Boot capture window — for the first 30 seconds after spawn, the engine scans up to 80 lines of terminal output on each sweep tick for a UUID pattern ([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}). This is what every major CLI prints as its session header (claude --session-id, codex session, cursor agent --session). First UUID seen wins — the engine does not currently match against CLI-specific context markers, so any earlier UUID in boot output (e.g., a log line printing a workspace UUID, a trace ID, or a dependency version string) can steal the slot. Engine persists the first-seen UUID as cli_session_id on the agent record.
Crash detection — when the PTY dies with a recoverable error (surface disappeared, shell exited unexpectedly) AND crash_recover=true AND user_killed !== true AND cli_session_id is set, the next sweep attempts respawn.
Respawn path — new surface created, same launcher with the per-CLI resume command appended: <repo>Claude -s --resume <id> / <repo>Codex -s resume <id> / cursor agent --session <id>. Up to MAX_RESPAWN_ATTEMPTS = 10 attempts before giving up.
user_killed guard — if the user explicitly killed the agent via stop_agent (userInitiated=true, the default), crash_recover will NOT respawn it. Context-limit evictions pass userInitiated=false, keeping the agent eligible for recovery.

When to pass crash_recover: true:

Long-running Codex workers on PR loops (>30 min expected).
Cursor audit workers that need to survive a cmux daemon restart.
Claude workers executing multi-hour sprints where mac-crash resilience matters.
ANY worker in an overnight / autonomous context where you won't be watching.

When NOT to pass it:

Short-lived helpers (<5 min) where a crash means "task failed, start over".
Workers in ambiguous session-ID formats (older CLIs without UUID headers).
Debugging sessions where a crash IS the signal to investigate.

Caveat — the session-ID heuristic is regex-based. If a worker's early output happens to contain a UUID that isn't the CLI session (e.g., a log line printing a workspace UUID, a trace ID, a dependency version), the wrong ID gets captured and recovery will silently fail later.

Verification procedure for critical spawns:

Within 60 seconds of spawn: get_agent_state(agent_id: "...") → check cli_session_id is not null.
Cross-check against the CLI's own session display via read_screen:
- Claude: look for Session: or --session-id in the header.
- Codex: look for session in the splash / first turn.
- Cursor: look for --session in the boot log.
If cli_session_id does not match the on-screen session — or if it's null after 60s (capture window expired) — call stop_agent(userInitiated: false) to keep recovery eligibility, then respawn fresh and re-verify.
For autonomous / overnight workers: fold this check into your spawn script and fail loudly on mismatch. A wrong cli_session_id means crash_recover will respawn with the wrong --resume argument and the new session won't have your context.

Related new agent-record fields (visible via get_agent_state / list_agents):

cli_session_id — captured UUID, or null if capture window expired before a match.
respawn_attempts — count of recovery attempts made (0 to 10).
user_killed — true if user explicitly stopped via stop_agent.
workspace_id — owning workspace UUID (persisted for cross-session recovery).

Distinct from the -c continue flag on repoGolem launchers. -c resumes an exited session that the user re-launches manually. crash_recover handles unexpected PTY death mid-task, automatically. Use -c when you stop a worker and want to come back to it later; use crash_recover: true when you want the engine to auto-recover without your intervention.

Spawning Agents — launcher policy pointer (2026-06-05)

Launcher/-s/-m rules → see repogolem (canonical; generated from ralph at/launchers.at). cmux-specific guardrail: do not pass model to spawn_agent; spawn examples below deliberately omit model overrides.

Spawning Agents (MANDATORY: use `spawn_agent`)

mcp__cmux__spawn_agent({
  repo: "golems",
  cli: "codex",
  prompt: "Fix search ranking"
})
→ { agent_id, ... }
 
mcp__cmux__wait_for({ agent_id, target_state: "ready", timeout_ms: 120000 })
mcp__cmux__send_to_agent({ agent_id, text: "Narrow scope to ranking only", press_enter: true })
mcp__cmux__wait_for({
  agent_id,
  target_state: "done",
  timeout_ms: 1800000,
  report_path: "/abs/path/report.md",
  done_marker: "DONE_EXACT_MARKER"
})

Options: repo, cli, prompt, workspace, parent_agent_id

Launcher/-s/-m rules → see repogolem (canonical; generated from ralph at/launchers.at).

Examples:

spawn_agent({ repo: "golems", cli: "claude", prompt: "Survey search regressions" })
spawn_agent({ repo: "golems", cli: "cursor", prompt: "Audit code quality and write findings to /tmp/audit.md" })
spawn_agent({ repo: "orchestrator", cli: "gemini", prompt: "Survey patterns and summarize in 5 bullets" })
spawn_agent({ repo: "golems", cli: "kiro", prompt: "Draft a concise troubleshooting note" })

Hyphenated repo names: if the repo contains -, verify the real repoGolem launcher in /repogolem first (FR-01). The doc warning is there because cmuxlayer can still mis-resolve the launcher name.

Auto-workspace-categorization by launcher root

When spawning a visible pane via mcp__cmux__new_split + sending a launcher command, the pane MUST land in the workspace whose name contains (case-insensitive substring) the launcher's repo root. Examples:

coachClaude -s → workspace whose title contains "Coach"
orcClaude -s → workspace whose title contains "orc" (or workspace:2 by default)
voicelayerCodex -s → workspace whose title contains "voicelayer"
brainlayerCodex -s → workspace whose title contains "brainlayer" or "orc-buddy"

Lookup protocol BEFORE new_split:

Identify the launcher (coachClaude, orcClaude, etc.).
Extract the repo root (coach, orc, voicelayer, brainlayer).
Call mcp__cmux__list_surfaces() to enumerate live workspaces.
Pick the workspace whose title contains the repo-root substring (case-insensitive).
Pass that workspace ref to new_split.
After spawn, verify via list_surfaces that the new surface ended up in the intended workspace. If not, IMMEDIATELY call mcp__cmux__move_surface(surface, workspace) to fix.

Why post-spawn verify is mandatory: the current new_split MCP call's workspace arg is advisory — actual placement may follow current focus, not the arg. Until cmux MCP enforces the arg, agents MUST verify + move post-spawn. Skipping the verify step is how the live 2026-05-17 incident happened.

Evidence: Live 2026-05-17 ~02:32 IDT — skillCreator s:3 spawned 8 Batch D eval panes via new_split(workspace:"workspace:1"); 4 coach-launcher panes landed in workspace:2 anyway. orc had to manually move_surface to recover. Cost: ~5 minutes of orc context burning on layout repair instead of dispatch.

spawn_agent({workspace}) already routes correctly in most cases — this rule specifically covers the lower-level new_split path used when you need a non-agent terminal or a launcher invocation that isn't a cli enum value.

NEVER Use These Commands (Common Mistakes)

Root cause (April 5 2026): mehayomClaude used cursor-cli in a cmux pane — command not found. Wasted a surface and debugging time.

WRONG	RIGHT	Why
`cursor-cli "prompt"`	`cursor agent --trust "prompt"`	`cursor-cli` does not exist as a command
`cursor --print --output-format text`	`cursor agent --output-format text`	`--print` is not a cursor flag
`cursor agent --output-format` (in cmux)	`cursor agent --trust` (in cmux)	Interactive cmux agents need `--trust` for permissions, not piped/batch output

Launcher/-s/-m and headless/print rules → see repogolem (canonical; generated from ralph at/launchers.at).

For visible Cursor agents: Use spawn_agent({cli:"cursor"}) from the parent orchestrator. These workers should boot as addressable agent_ids, not as manually typed launchers. For batch/piped output: Use cursor agent --output-format text --model "<cursor-max-mode-model>" "PROMPT" directly. Substitute Cursor's current Max Mode model ID (verify via cursor agent --help or the Cursor changelog — IDs change between versions). Default to no --model flag (Auto) per /agent-routing AP3. For repoGolem launchers: see repogolem (canonical; generated from ralph at/launchers.at).

Cursor `/auto-run` fallback

Default path: put the task in spawn_agent.prompt and talk to the worker via send_to_agent({agent_id,...}).

If a live Cursor pane still stalls on approvals, send /auto-run as a follow-up first:

wait_for({ agent_id, target_state: "ready", timeout_ms: 120000 })
send_to_agent({ agent_id, text: "/auto-run", press_enter: true })
send_to_agent({ agent_id, text: "<task prompt>", press_enter: true })

If FR-06 blocks send_to_agent, confirm the raw pane with read_screen and use one surface-level fallback send.

Task Routing

Engine routing rules → see agent-routing (canonical). cmux-agents only maps the selected engine to visible panes and lifecycle controls.

Agent Selection Guide

Engine routing rules → see agent-routing (canonical). Launcher/-s/-m rules → see repogolem (canonical; generated from ralph at/launchers.at).

Full CLI syntax and capabilities: adapters/ directory + adapters/capabilities.yaml.

Post-Restart Truth-vs-Display (2026-06-06)

After ANY cmux restart (daemon bounce, Mac wake, manual relaunch), every lead follows the proven checkpoint pattern:

checkpoint (record agent_ids + last-known state)
  → restart event
  → VERIFY (list_agents + read_screen per worker)
  → report liveness FROM EVIDENCE ONLY

Rules:

list_agents / registry alone is not enough — pair with read_screen scrollback on each worker you report as alive.
Pre-restart claims ("Codex s:63 still working") are stale until re-verified. Operator-direct catch: "I don't see any worker of yours working still."
Status reports to collab/Etan must cite what read_screen showed, not what you remembered before the restart.
Claims-lag window is real (fleet may repopulate over minutes) — say "unverified post-restart" until VERIFY completes.

Resume-Freeze Doctrine (2026-06-07)

Why this exists: a -c resumed session does NOT reliably continue pending duties from the pre-death turn. One resumed seat froze TWICE on the same standing ritual in a single window — verify/re-arm/completion-line never executed and the seat sat ~95 min unmonitored; on the next cycle the completion line landed 96 min late.

Rules:

Treat a resumed session as a fresh boot. Re-derive its duties from durable artifacts (collab file, duty checklist, output files) — never trust the dying session's stated intentions to carry forward.
Standing orders that must survive a restart are encoded as durable machinery, not agent to-do intentions. The proven watch-v6 pattern: the monitor itself fires the action (nohup detached) AND writes the collab announce — removing the agent as the failure point. launchd jobs and Monitors that act + announce are the durable forms.
The orchestrator verifies the resumed seat executed its duty list (collab checklist) — never assumes the resume carried the duties forward.

MCP / Daemon Recovery Authority

Agents must not idle because an MCP, daemon, or socket needs a restart. If the delegated mission includes making a lane green, production-ready, deployed, or merged, it includes authority to perform in-scope recoverable operations:

restart/reload BrainLayer, cmuxlayer, VoiceLayer, app daemons, or MCP servers;
rebuild/restart local LaunchAgents or socket services needed for verification;
flush/replay queues or rebuild derived indexes when the action does not delete source/user data and the goal covers that subsystem;
spawn/resume a successor if the current agent cannot reconnect after restart.

Codex-specific rule: Codex cannot hot-reconnect MCP servers after startup. If a Codex worker loses MCP access or needs a new MCP config, it must checkpoint and continue through a fresh managed Codex session:

Write a handoff file with branch, commit/PR, goal file, report path, DONE marker, MCP/service state, and exact next command.
Restart/reload the in-scope service if required.
spawn_agent or resume with crash_recover:true when available, passing the same goal/handoff file.
Verify the successor has the needed MCP/tools, then continue. Do not wait for Etan to notice.

Human approval is needed only for irreversible data deletion, credential/account changes, paid external actions, force-push/history rewrite, or destructive git cleanup of unowned work. A safe restart/reload/re-index is not a blocker.

Worker-Brief Authoring Contract (2026-06-06)

Every dispatch brief MUST satisfy all five before send (see workflows/prompt-audit.md §8):

Absolute verified paths — inline the full path to every artifact the worker must read, especially the collab file. Never "collab file above" with no path; workers lose orientation without it.
Truthful environment — state what exists (worktree path, branch, deps, Playwright, .mcp.json). If the worker must create something, say "create it yourself" with the exact command — never imply a promised path/deps exist when they don't.
Live data contracts — before describing schemas, field names, or queue shapes in the brief, verify them against the live artifact (Read / grep / query). Five brief-content drift incidents in one evening came from stale or invented data shapes.
Full delegation preservation — if the user gave a broad mission, the goal file carries the whole mission and success criteria. Do not narrow it to the next blocker unless the user explicitly narrows scope.
File-backed completion — include goal file, report path, exact DONE marker, and green/no-green decision criteria.

Pre-Spawn Prompt Checklist

Every agent task prompt MUST include:

Max output length ("2-3 sentences" / "under 200 words")
Output format ("use this template: ...")
Audience — internal or client-facing (agents leak investigation details if not told)
What NOT to include — internal deliberation, false alarms
Done signal (for autonomous workers, the report file's last line is the exact DONE marker; pane text is secondary)
Response markers — "Wrap your final deliverable in ---RESPONSE_START--- and ---RESPONSE_END--- markers"

Full audit: workflows/prompt-audit.md

Prompt Size Ceiling — `send_input` freezes surfaces above ~2000 chars

Why this exists: A 72-hour JSONL sweep (2026-04-12 → 2026-04-15) across 5 Claude Code orchestrator sessions found that send_input calls over 2,000 characters correlated strongly with surface freezes. In one Mehayom-app session: 868 total send_input calls, 13 exceeded 2,000 chars (max 4,382), and those 13 large calls preceded 6 frozen / surface-unresponsive incidents. In a control session (coach repo) every send_input stayed under 1,900 chars and there were zero freezes. The coach control case is the proof: discipline eliminates the symptom.

Direct evidence (from the Mehayom-app session):

"You're right — the cmux surface was frozen, my send_input commands weren't taking effect (sent 4 commands, none showed up). I burned ~3 minutes trying before deciding to implement directly rather than waste more time debugging cmux."

The rule (hard cap + warn)

Payload length	Action
< 1,500 chars	Send directly. Safe zone.
1,500–1,800	Warn threshold. Trim if possible, then send.
≥ 1,800	HARD CAP. Do NOT `send_input` directly. Use the file-based handoff pattern below.

1,800 chars gives a ~17% safety margin under the smallest observed freeze point (~2,100 chars). Above the cap, surfaces freeze silently — send_input still returns ok:true but the target pane never sees the input.

File-based handoff pattern (for prompts > 1,800 chars)

Three steps. The first one is a shell command you run via the Bash tool. The next two are cmux MCP calls.

Step 1 — write the long prompt to disk via Bash (or the Write tool):

printf '%s' "$LONG_PROMPT" > /Users/etanheyman/Gits/orchestrator/collab/surface-N-$(date +%s).md

Prefer the Write tool when the prompt is already in-context — it avoids shell quoting pitfalls on multi-line prompts.

Step 2 — type the cat command into the target surface, then press Return:

send_input(surface: "surface:N", text: "cat /Users/etanheyman/Gits/orchestrator/collab/surface-N-<stamp>.md")
send_key(surface: "surface:N", key: "Return")

send_input on its own only TYPES the text — without send_key("Return") the command sits on the shell line unexecuted. This is the same execution contract used everywhere else in this skill.

Step 3 — verify the handoff landed: after a few seconds, read_screen the surface and confirm you see the file contents (or the agent's response to them). If the pane is frozen despite the short pointer, the file is still safe on disk — spawn a fresh surface and re-run Step 2 against the new surface.

Why the file pattern beats splitting the prompt into multiple sends:

Splitting into N sub-2000 chunks still hits the freeze path if any chunk pushes the pane state over some cumulative limit, and loses atomicity (the agent sees a partial prompt).
The file is independent of the pane's mutable state. If the pane freezes, spawn a new one and re-run the cat — the prompt is preserved.
File handoff also survives pane crashes, restarts, and compactions.

On a failed send_input: check the surface before retrying

A retry without root-cause investigation is almost always wrong. After any send_input that appears to have been lost (no output, no activity, retry instinct kicking in):

read_screen(surface: "surface:N", lines: 10) — is the pane frozen, scrolled back, or just slow?
If frozen → follow Step 3 of Surface Health Check (close, respawn, salvage).
If the payload you just sent was near or above the 1,800-char cap → that was the root cause. Don't retry at the same size; switch to file-based handoff.
If unsure → list_agents / my_agents to confirm the worker still exists.

Do not fire a second send_input of the same large payload hoping it "works this time." It won't, and you'll burn the same 3 minutes the earlier session logged.

Boot + deliver: FOCUS-FIRST (the reliable bundle for "send the prompt once booted")

Root cause (Etan, recurring — 2026-05-30): a cmux pane/agent does NOT initiate until its workspace/pane is FOCUSED. So boot_prompt_path and a bare wait_for({target_state:"ready"}) on an unfocused pane NEVER RESOLVE — they hang on a ready-state that can't arrive because the agent hasn't started. Etan (paraphrased): "It never resolves — focus the pane for 3 seconds, then read, then if ready send the prompt." Do not lean on boot_prompt_path / blind long wait_for to deliver a boot prompt. Focus is the missing precondition.

The reliable bundle (proven spawning orc-gen-12 on 1M, 2026-05-30):

1. CREATE the pane:   new_split({direction, role, workspace, focus:true})  → surface
                      (focus:true is REQUIRED — role-based tab placement rejects focus:false)
2. LAUNCH (no boot_prompt_path):  send_command({surface, command:"<repo>Claude -s"})
                      (launcher/`-s`/`-m` policy lives in `/repogolem`)
3. FOCUS so it initiates:  select_workspace({workspace})  +  `cmux focus-pane --pane <pane>`
4. WAIT ~3s, then READ:  read_screen({surface, parsed_only:true})
                      → confirm status ready/idle/working
5. IF READY, deliver:  send_command({surface, command:"<the prompt>"})  — pane is focused, it lands.

Launcher/-s/-m verification rules → see repogolem (canonical; generated from ralph at/launchers.at).
boot_prompt_path is for already-focused/foreground spawns only. When you're driving from another pane, it will hang — use this focus-first bundle instead.
Upstream fix (route to cmuxLayer-LEAD): boot_prompt_path / wait_for should auto-focus the target pane before waiting for readiness — then this 5-step bundle collapses back into a single reliable primitive. Until then, focus-first is mandatory.

Composer-Wedge Runtime Doctrine (2026-06-06)

Why this exists: boot_prompt_delivered, submit_verified, working-status, and token_count have all returned false positives — including submit_verified:true on a Claude pane with unsubmitted text still sitting after the › prompt marker. Doc edits alone did not stop this class: seven observer-window catches occurred all post-#478-merge (tallies vary 3/4/5/7 across observers — no canonical ledger). This section is mitigation doctrine; the cure is the cmuxlayer composer-is-empty code fix (dispatched separately as D6).

Untrusted submit signals — treat ALL of these as hints only, never ground truth:

boot_prompt_delivered
submit_verified
parsed working / idle / ready status from registry
token_count (including phantom climbing counts on never-started sessions)

The only reliable submit check: prompt text visible in SCROLLBACK above the working line. Text still sitting after the › prompt marker = unsubmitted, regardless of what any delivery field says.

Mandatory post-dispatch read: read_screen (or get_agent_state + scrollback read on dispute) ≤15 seconds after EVERY dispatch — boot prompt, follow-up, or send_to_agent.

Anomaly triggers — force a full scrollback read:

'Queued follow-up inputs' visible on a worker that registry says is working
Token count unchanged across two reads after a send
Cost/token meter not moving while status says working

Idle Codex sends: when the pane is genuinely idle (at › / $ prompt), use send_command (atomic) or send_input + verified status flip — never fire-and-forget without the ≤15s read.

Surface Health Check (MANDATORY — before ANY prompt delivery)

Why this exists: Frozen/dead surfaces are a top-3 frustration source. Agents send prompts to unresponsive surfaces, losing work and burning ~3 minutes per incident on manual fallback. send_input returns ok:true even on frozen terminals.

Step 1 — After spawning a worker: wait_for({agent_id, target_state:"ready", timeout_ms:120000}) is the default health gate. If it times out, inspect the raw pane with read_screen. If the Codex prompt is visible anywhere in the pane, including top-aligned after the boot panel, treat it as parser drift and deliver through the managed agent if possible; do not respawn solely because the parser still says booting. Two raw-screen failures with no usable prompt = dead worker — stop it and respawn. If wait_for hangs to timeout and the pane looks un-booted, the pane is probably UNFOCUSED — see "Boot + deliver: FOCUS-FIRST" above; focus it, then re-check.

# After spawn_agent → got agent_id + linked surface
wait_for({ agent_id: "agent:abc123", target_state: "ready", timeout_ms: 120000 })
# ✅ Ready/working state = worker booted
# ❌ Timeout: read_screen(surface: "...", lines: 10) to check for FR-06 parser drift

Step 2 — Before sending ANY prompt to an existing surface: Always read_screen first to confirm the surface is responsive. Never fire-and-forget a send_input without checking state first.

# Before sending prompt to surface:N
read_screen(surface: "surface:N", lines: 5)
# ✅ Responsive: shows shell prompt, cursor output, or agent activity
# ❌ Unresponsive: blank screen, no change from last check, or error output

Step 3 — Unresponsive surface recovery: If a surface fails 2 consecutive read_screen checks (blank, frozen, or no shell prompt):

read_screen(surface: "surface:N", lines: 80, scrollback: true) — salvage any partial work
brain_store any salvaged progress
stop_agent({agent_id})
spawn_agent({...same task...}) — create a fresh worker
Re-verify with Step 1 before sending the prompt

Step 4 — Safe prompt delivery (special character escaping): Never send special characters (backticks, quotes, markdown formatting) directly via send_input. They get interpreted by the shell and corrupt the prompt.

# WRONG — backticks and quotes break in send_input:
send_input(surface: "surface:N", text: "Fix the `processQueue` function")
 
# RIGHT — use heredoc pattern:
send_input(surface: "surface:N", text: "cat <<'PROMPT_EOF' | clipboard\nFix the processQueue function\nPROMPT_EOF")
 
# RIGHT — escape special characters:
send_input(surface: "surface:N", text: "Fix the \\`processQueue\\` function")
 
# SAFEST — write prompt to a temp file, then cat it:
# 1. Write prompt to /tmp/agent-prompt-N.txt via Bash
# 2. send_input: "cat /tmp/agent-prompt-N.txt"

🚨 THE @-MENTION FILE-PICKER TRAP (real bug, 2026-06-14): never put a bare @word in send_input/send_command text destined for an interactive agent composer (Claude Code, Codex, Cursor TUIs). The receiving composer interprets @ as its file-reference trigger and pops a file-picker overlay — the rest of your message gets swallowed/mangled and the agent never sees the real prompt. This is delivery corruption that ok:true will NOT report.

# WRONG — "@narration-lead" fires the receiver's file-picker, mangles the send:
send_to_agent(agent_id: "...", text: "@narration-lead please pick up the dashboard work")
 
# RIGHT — drop the leading @ (claimed-name addressing is for COLLAB-FILE posts, not pane sends):
send_to_agent(agent_id: "...", text: "narration-lead: please pick up the dashboard work")
 
# RIGHT — if a literal @ is unavoidable, route via the file-based handoff (cat a file) so the
# composer ingests it as file contents, not live keystrokes through the @ trigger.

Rule of thumb: @<name> belongs in the collab .md (where monitors match it), NOT in keystrokes typed into another agent's composer. Pane-to-pane addressing uses the bare name or the [FROM=… TO=…] envelope body — never a leading @.

Checklist (run mentally before every send_input):

Did I read_screen this surface in the last 60 seconds?
Does it show a shell prompt or active agent?
Does my prompt contain backticks, quotes, or markdown? → escape or heredoc
Is this a fresh surface? → did I wait 3s and verify the prompt?

Parsing Agent Output

When reading agent output via read_screen, search for ---RESPONSE_START--- to find the structured response. Everything between START and END is the deliverable — ignore terminal noise, tool calls, and deliberation outside the markers.

# Pattern: read enough scrollback to capture the full response
read_screen(surface: "surface:N", lines: 80, scrollback: true)
# Then look for ---RESPONSE_START--- ... ---RESPONSE_END--- in the output

If markers are missing, fall back to reading the last 50 lines + done signal. But if you wrote the prompt correctly (checklist above), markers will be there.

Monitoring Protocol

OUTBOUND worker monitoring lives here. Inbound lead/orchestrator monitor, cron, and loop-payload rules → see cron-payload-discipline (canonical).

"I'll monitor them" is NOT monitoring. Monitoring = explicit waits on explicit agent_ids.

This applies at EVERY tier: each LEAD that dispatches a worker owns a monitor loop on that worker. orc's fleet monitor is a backstop for lead-busy/codex-idle inversion, not a substitute (gen-10 weave #26).

After spawning:

Update AGENT_REGISTRY
wait_for({agent_id, target_state:"ready"|"working", timeout_ms:120000}) to verify boot
get_agent_state({agent_id}) + inbox_check({agent_id}) to verify session/resume and subscription health
Verify placement with list_surfaces({workspace, verbose:true}): same workspace as caller unless explicitly overridden; role:"worker" in the right worker pane/column; role:"orchestrator" in the existing left lead pane as a tab when a lead pane already exists; no surprise extra workspace, extra left split, or third column.
Verify prompt submission with a raw pane read within 15 seconds. If the full goal text is visible after the › composer prompt and the agent is idle/ready, the prompt was typed but not submitted. Press Return only to recover, then record boot_prompt_typed_not_submitted.
For long tasks, require an output file + DONE marker and wait on wait_for({agent_id, target_state:"done", report_path, done_marker})
Verify the lead wake path: a DONE marker or Goal achieved must trigger wait_for, inbox/subscription, cmux notification, or temporary watcher nudge to the owning lead. If not, record completion_notification_missing.
After any topology change or crash, re-run list_agents / my_agents before sending follow-ups — surface ids drift
If wait_for or get_agent_state disagrees with the visible pane or list_surfaces, use read_screen/list_surfaces to adjudicate FR-06 and report the registry mismatch
When the agent finishes, read output IMMEDIATELY — don't wait for the user to ask
After system events (Mac wake, BrainBar restart, network change, ANY cmux restart): inspect active workers via list_agents + read_screen on every worker you intend to report on. Agents can lose MCP silently. Never carry pre-restart liveness claims forward — see Post-Restart Truth-vs-Display below.

Lead context conservation: after a file-backed goal is delivered, stop high-frequency pane polling and status narration. Monitor the report path, DONE marker, and low-frequency health checks. If the file exists with the marker, call file-backed wait_for({agent_id,target_state:"done",report_path,done_marker}), then read the file and synthesize. If file-backed wait still disagrees with the report marker, record cmux health evidence and continue from the file contract; do not wait for a pane response just because cmux state has not flipped to done.

If a monitor finds a worker saying "I cannot continue", "need permission", "MCP disconnected", "cannot reconnect", or "waiting for user" for a recoverable operation, the monitor must intervene within one cycle: route to /pr-loop, restart/reload the in-scope service, or spawn/resume a successor with a file-backed handoff. Leaving the worker idle is a monitor failure.

Inbox / Collab Subscription / Nudge Contract

The primary wake transport is adapter-owned: Claude has native monitor/loop; Codex needs inbox/notification or a background watcher. Collab files, goal files, and report files are durable state; they do not wake a sleeping Codex by themselves.

Any event that should wake another agent or lead must be represented as an inbox/subscription/notify event keyed by agent_id:

collab mention (@claimed-name, BLOCKED, ### claimed-name);
worker report path created or updated;
exact report DONE marker observed;
visible TASK_DONE / Goal achieved with a known report path;
health failure (blocked:health, stale inbox, registry/screen disagreement, missing cli_session_id, non-resumability).

A collab mention is deliverable only when one of these is true:

native monitor/subscription is active for the agent.
The agent inbox monitor is alive (inbox_check.monitor_alive:true).
A BrainLayer subscription/channel is active and acknowledged.
A temporary watcher is explicitly running.

Temporary watcher rule: for investigation or overnight stopgaps, run a watcher for at most 5-minute cadence that scans the collab plus registered goal/report paths for mentions, BLOCKED, report creation, and DONE markers, then calls dispatch_to_agent({agent_id, nudge:"auto"}) for the target or owning lead. Report the watcher path/PID/task id in the collab. This is fallback plumbing, not the final product; stale/absent watcher = health failure.

The watcher is owned by the lead, not by Etan. If the lead asks "why did it not notify me?" and no native monitor, inbox heartbeat, watcher PID/task id, or wait_for(done) receipt exists, the lead failed to arm its guard before delegating.

Worker utilization check: routing violations → see agent-routing (canonical). Worker surface crash/closure remains a cmux-agents lifecycle issue: respawn immediately on a new surface and resend the task with recovered context.

Anti-patterns:

Fire and forget — spawning without wait_for. You WILL miss failures.
Checking only when user asks — agent may have been stuck 20 min.
Not reading output files — agent finished, you never read results.
Using Task tool when user says "cmux agents" — Task agents are invisible. cmux agents are VISIBLE. The user wants to SEE them.
Sending task prompt to Cursor without /auto-run first — worker silently stalls on permission prompts.
Sending follow-ups to remembered surface numbers instead of agent_ids — surface drift will eventually burn you.
Promoting an orphan/raw surface by renaming it LEAD — no managed agent_id means no production lead.
Using send_input as the normal path for a lead. Raw surface sends are an escape hatch only; after one is used, immediately recover/register or replace with a managed agent_id.
Letting interrupted/aborted spawns become production topology. Clean them up or relaunch managed.
Renaming an unrelated coordinator pane into another lane's lead. Lane ownership follows the collab Agent row and managed agent_id, not a tab title.
Engine-routing violations → see agent-routing (canonical).

Envelope-vs-Delivery Pairing (MANDATORY)

Why this exists: orphan envelopes — [FROM=X TO=Y TYPE=Z] blocks written to the author's OWN pane and never actually delivered to Y — are a top friction source in multi-agent collabs. 64 orphan envelopes were observed in one Codex session (wave3-codex-bulk Block B). Verbatim user friction: "Why do you have these messages in your chat but no one enters them?" Recurred across Wave 1-3 (3+ logged occurrences).

The rule

Any [FROM=<self> TO=<target> TYPE=<status|mission|task_done|ack>] envelope block you emit to your own pane output MUST be paired with a mcp__cmux__send_to_agent({agent_id: <target>, text: ...}) or mcp__cmux__send_to(...) call in the SAME turn.

Rule of thumb: if you wrote TO=X in plaintext, you wrote it FOR X — so deliver it to X. An envelope in your own pane that wasn't sent is a message in a bottle, not communication.

Same-turn pairing pattern (copy this)

# 1. Compose the envelope you want X to see:
envelope = """
[FROM=orcClaude TO=coachClaude TYPE=task_done]
Audit finished, 14 findings, no commits.
[/FROM=orcClaude TO=coachClaude TYPE=task_done]
"""
 
# 2. SAME TURN — actually deliver it:
mcp__cmux__send_to_agent({
  agent_id: "agent:coach-...",
  text: envelope,
  press_enter: true
})

Both must appear in the same model turn. If you only emit step 1 (the plaintext envelope in your own pane), the message is undelivered — X never sees it.

Anti-patterns (these are the orphans)

Anti-pattern	Why it fails
Writing `[FROM=X TO=Y ...]` in your pane "for the log" without `send_to_agent`	Y never sees it; orchestrator-side log is not a communication channel
Splitting envelope and delivery across turns ("I'll send it next turn")	Forgetting is the default; next turn rarely happens
Calling `send_to_agent(text: "hi")` after composing a `FROM=/TO=` envelope but sending only the bare message text	Recipient loses the FROM/TO/TYPE metadata the envelope was built to carry
Emitting envelope to a CLAUDE_COUNTER summary instead of calling `send_to_agent`	The summary is yours, not the recipient's inbox

Acceptance test

If your turn output contains a [FROM=...TO=...TYPE=...] block, your turn's tool calls MUST also contain a matching mcp__cmux__send_to_agent (or mcp__cmux__send_to) call whose agent_id corresponds to the TO= target and whose text includes the envelope contents. No pair → orphan envelope → rule violation.

Done Signals

For autonomous cmux workers, the primary done signal is the contracted report file containing the exact DONE marker as its last line. Pane text, CLAUDE_COUNTER-adjacent markers, and wait_for(done) are secondary lifecycle hints unless wait_for is called with the same report_path and done_marker.

Use pane-visible DONE markers only when there is no file-backed contract. If the report file has the marker and the pane/registry stays ready, harvest the file and report the registry mismatch as health evidence instead of polling the pane.

If pane-visible TASK_DONE / Goal achieved appears and the owning lead did not receive a wake/notification/ack event, the completion still counts as file/task evidence but the agent lifecycle is unhealthy. Record completion_notification_missing and route the issue to cmuxlayer; the product requirement is not just "worker finished" but "lead was woken to harvest and verify it."

Collab Pattern

Copy ~/Gits/orchestrator/collab/TEMPLATE.md first — never write a collab from scratch. The collab-guard.py hook WILL block you.

# 1. Write collab file from template
# 2. Spawn agents with collab instructions
for entry in "search:Agent1" "perf:Agent2" "security:Agent3"; do
  angle="${entry%%:*}"; name="${entry#*:}"
  spawn_agent({
    repo: "TARGET_REPO",
    cli: "claude",
    prompt: "Read collab/FILE.md — you are " name ". Claim " angle ". Update collab when done."
  })
done

Log every action in collab: spawns, completions, blockers. No silent work.

Collab append protocol (2026-06-07):

Append at FILE BOTTOM, always. Mid-file insertions are invisible to a tail-reading pair — a SPIKE-0 result posted above the pair's later "standing by" line left the pair idle ~1h.
NEVER use context-matching patch tools (apply_patch anchored on a signature line) to append to a shared collab file — signature lines repeat, and the patch lands at the first match instead of EOF. Use an explicit end-of-file append.
@tag the recipient on every handoff (@agentName:).
Tail-read after every append — read the file tail and confirm your message is the last entry before moving on.

Name-claim protocol (2026-06-11, Etan standing rule): "Agents need to claim names in colabs and use them to monitor and communicate with eachother, so its not flappy."

CLAIM-ON-ENTRY — a seat's first post in any collab channel begins with a grep-stable claim line: > CLAIM name=<name> role=<lead|worker|weaver|orc> monitor=<task-id|none> monitor=none is an explicit contract: "no delivery guarantee — nudge me."
STABILITY — the name is immutable for the channel's life; post headers are ### <claimed-name> (<ts>), byte-identical to the claim. Rename only via > CLAIM name=<new> supersedes=<old> (rare, loud).
WORKERS INHERIT (the spawn rule — leads own this). Workers are named <lead-claim>-w<N>, assigned at spawn by the lead, claimed by the lead on the worker's behalf. The spawning lead posts the worker's claim line to the collab BEFORE the first dispatch to that worker. Never ad-hoc per-post identities.
ADDRESSING — @-mentions use claimed names ONLY; mentioning an unclaimed name is the mentioner's comms error. Channel monitors anchor on claimed names (@<name>|### <name>) — only safe because of 1-2.

Roster query: grep '^> CLAIM' <channel-file>.

Born: 2026-06-11 gen-16 night — 6 ephemeral worker identities in one night broke @-mentions and monitor targeting (voicebarClaude-builder/-pass2/dashboard-audio-agent/dashboard-v2-regen/graphify-pilot/-rollout).

Git Worktree Isolation

Parallel agents MUST use separate worktrees — without them, agents clobber each other's git state.

Claude subagents: Agent(isolation="worktree") (built-in)
cmux agents: Create worktrees manually before spawning, then launch the worker via spawn_agent. See worktrees skill for details.
Codex apply_patch: when your assigned worktree differs from the session cwd, every apply_patch filename MUST be an absolute path inside the worktree. exec_command.workdir does not apply to apply_patch; relative patch paths resolve against the session cwd and can mutate the main checkout silently.

Pane Hygiene — harvest, review, close

For finished one-shots and worker panes, treat harvest -> review -> close pane as one sequence. Capture the output, confirm the task result, then close/stop the pane. If the report file and DONE marker are verified but cmux still calls the pane idle or otherwise refuses a normal close, force-close the temporary worker pane and record the stale lifecycle evidence. Only panes hosting live processes, active review context, or an explicit testing surface stay open, and the collab/status must say why that pane is still live.

When a right-side worker shows visible TASK_DONE and its findings/report are captured, consolidate immediately: mark the collab row done or DONE-closeable, then close/archive according to the closure invariant. Registry ready/idle does not override visible TASK_DONE plus the contracted output file.

Leaving a completed worker visible "just in case" is not neutral. If the lead keeps a DONE worker open for review, install, live logs, or follow-up diff inspection, the Agent row must say KEPT_OPEN:<reason> with owner and next check. Otherwise, harvest/review/close it. Missing cleanup or missing reason is done_worker_left_open health evidence.

Closure Invariant — zero workers must mean terminal state

If the user returns to a collab workspace and there are zero worker panes/agents for that collab, that must be a trustworthy signal. It means every lane is terminal, not that unfinished work was hidden.

Before closing, stopping, archiving, or force-closing any worker pane, verify and record one of:

DONE: the collab row has report_path, exact done_marker, and the report file exists with that marker verified.
BLOCKED / NOT_GREEN: the collab row has a file-backed handoff with health evidence, next action, and owner.
TRANSFERRED: the collab row names the successor agent_id, goal file, report path, and delivery evidence.

If none of those is true, do not close the worker as cleanup. Mark closure_without_artifact / blocked:health, notify the lead, and leave a visible pane or file-backed handoff for the next session. A forced close without a terminal row is a production failure, even if the pane looked idle.

Essential Rules

spawn_agent for every worker — no hand-rolled launch flow
Discover workers before acting — use list_agents / my_agents
NEVER read_screen your own surface — recursive output
Verify after launch — wait_for first, get_agent_state second, read_screen only for disputes
Workers = caller/current-workspace splits, audits/research = separate named workspaces unless the user explicitly asks for a different workspace
Sequential launch — brief spacing still helps on first-run biometric prompts
Cross-workspace: always pass workspace ref — surfaces are workspace-scoped
Name everything — mcp__cmux__rename_tab on every surface
Update collab after every action — spawns, completions, blockers
AGENT_REGISTRY is mandatory — maintains state across compaction
BrainLayer agents are sequential — stagger by 10s+ (SQLite = single writer)
wait_for on spawn, not client-side polling — set up monitoring immediately
Don't give hour-scale estimates for <500-line tasks — parallel agents are 8-10x faster
Before the user leaves, every active worker needs either wait_for coverage or a file-based completion contract — no silent "I'll monitor."
Gems to ALL active agents — when you receive new research/context while agents run, send to ALL active workers immediately via send_to_agent. Contextualize per agent: include a 1-line "why this matters to YOUR task" — raw paste is noise. Don't hoard, don't defer, don't send to only the "obvious" recipient.
Respawn > absorb — frozen agent → read_screen(lines:80) to salvage partial work → brain_store progress → stop_agent → spawn_agent → resend SAME task with "already completed: [X]" context. Exception: remaining work <5 min AND you log "ABSORBING: [reason]" to collab. Default is ALWAYS respawn.
Brief agents about external events — Mac restart, user away, network outage → explicitly tell agents. They can't infer from timestamps.
Post-sprint retrospective — after every multi-agent sprint, brain_store what failed, what was corrected, what user said. Tags: ["orchestration", "incident", "cmux-agents"]. Without this, future sessions start from zero.
Max 2-3 concurrent worker agents — more causes AP watchdog kernel panics (April 5 crash: 9+ agents + enrichment + metro bundler = SoC watchdog expired). Before spawning a new agent, count running agents via list_agents. If >=3 workers active, wait for one to finish or kill a stale one. Enrichment daemon + build tools count toward the limit.
Skills don't hot-reload in long sessions — a Claude agent spawned 4+ hours ago runs stale skill versions. If a skill was updated mid-sprint, the running agent won't see the changes. Mitigation: (a) /compact forces re-read of skills, (b) orcClaude should notify long-running agents of critical skill changes via send_to_agent, (c) for critical fixes, respawn the agent instead of nudging.
Collab file BEFORE spawning workers — when dispatching 2+ agents on related tasks, write a collab file from TEMPLATE.md FIRST, then spawn workers with "Read collab/<file>.md" in their prompt. Do NOT dispatch agents via remembered surface numbers or bare launcher typing. Collab files are the coordination channel — without them, agents work in isolation and duplicate effort.
Agent health check BEFORE every follow-up — never message a worker without a recent get_agent_state or wait_for on its agent_id. If the registry looks wrong, use read_screen as the FR-06 fallback.
Do not pass model to spawn_agent — launcher/-s/-m rules → see repogolem (canonical; generated from ralph at/launchers.at).
Do not use headless/print modes for cmux agent sessions — headless/print rules → see repogolem (canonical; generated from ralph at/launchers.at).
Finished panes get closed only after terminal evidence — harvest, review, close. Force-close completed temporary workers after verifying their report/DONE marker if cmux leaves them idle. Never close an unfinished lane just to make the workspace clean; zero workers must mean DONE/BLOCKED/TRANSFERRED is recorded for every lane.
Monitor/cron/loop-payload rules → see cron-payload-discipline (canonical).
Pass role, not column; never type a leading @ into a pane — leads/orchestrators land LEFT, workers RIGHT automatically via role (ROLE GEOMETRY). @<name> goes in the collab .md, never in send_input/send_command (fires the receiver's file-picker, mangles delivery).
Report role/topology health — role mismatch, surprise workspace creation, registry/surface workspace disagreement, unexpected three-column layout, worker-in-left-column placement, lead-in-right-column placement, orphan/raw lead surfaces, bad repo/cwd labels, auto-* agents, null cli_session_id, non-resumability, dead inbox monitors, and wedged prompts are all health failures. Put them in the collab/report before claiming the fleet is healthy.
Reuse existing agents before spawning — if the user points at an existing worker or the same repo/workspace/role lane already has an idle/reusable agent, supersede that agent's goal with a file-backed contract instead of spawning a duplicate.
File report beats pane silence — when the worker writes the contracted report and final DONE marker, harvest the file immediately. Do not keep waiting on pane output or registry done.
Recover instead of parking — approved work authorizes in-scope commits/PR/merge through /pr-loop and safe daemon/MCP restarts. If an agent cannot continue after a restart, checkpoint, spawn/resume a managed successor, verify tools, and continue. Do not wait for Etan on recoverable blockers.
DONE must wake the lead — visible TASK_DONE, report DONE marker, or Goal achieved without a lead notification/ack path is completion_notification_missing. Harvest the report, record the health failure, and fix the active adapter guard.
Arm the guard before delegating — a Codex lead must record live inbox health, wait_for(done), or a background watcher/loop PID/task id that watches collab/report/DONE paths and wakes the lead. No guard means blocked:health.
Close completed workers or record why — after report/DONE harvest and review, close/archive temporary workers. If kept open, the Agent row must state KEPT_OPEN:<reason> with owner/next check. Otherwise it is done_worker_left_open.

Known Issues — cmux rename hooks (design proposal, do NOT edit without orcClaude review)

Background (2026-04-11): the cmux tab-rename auto-hook (launcher name → display name + color) has 5 observed bugs. Code changes are OUT OF SCOPE for skill-creator — this section is a design proposal for the next orcClaude session to dispatch.

#	Bug	Symptom	Proposed fix
5.1	Flat tab colors	All tabs use the same (or default) color; agent type not visually distinguishable	Map launcher function → color in a dict (claude=blue, codex=orange, cursor=purple, gemini=green, kiro=red). Set via `mcp__cmux__rename_tab(color=...)` on spawn.
5.2	Red-to-cyan weirdness	Some tabs flip from red (error/warning) to cyan unexpectedly; color state machine has a bad transition	Debug: instrument the rename hook to log every color-set call with timestamp + reason. Most likely cause: one code path sets color from agent state, another sets it from default, last write wins.
5.3	Nested naming collapses	Tab names for agents spawned in worktrees or nested panes lose their parent context (e.g. "golemsClaude > feat-X" → "claude")	When generating the display name, walk the surface parent chain and prepend up to 1 level of context. Truncate via ellipsis if longer than the tab width budget.
5.4	Weak semantic tag extraction	Tab names don't reflect what the agent is actually WORKING on (they just say "claude" instead of e.g. "claude: PR#232 fix")	Parse the task prompt on spawn — extract PR numbers (`PR#\d+`), issue refs (`#\d+`), and the first 3-5 imperative words. Fall back to launcher name if none found.
5.5	Launcher-to-display-name mapping missing	Tab shows raw function name (`golemsClaude -s`) instead of a friendly display (`golems • Claude`)	Add a `LAUNCHER_DISPLAY` lookup table keyed on the base launcher pattern (`{repo}Claude` → `{repo} • Claude`, `{repo}Cursor` → `{repo} • Cursor`, etc.). Regex extract `repo` + `agent`.

Where the hook lives: NOT located as of 2026-04-11 by skillCreatorBuddy. Not in ~/.claude/hooks/, not in ~/Gits/golems/hooks, not in ~/.config/cmux/settings.json. Likely candidates to check next session:

Swift cmux client source (may be built-in tab rename behavior, not a Python hook)
~/Gits/cmuxlayer/src/**/rename*
A LaunchAgent plist that watches cmux sockets
Part of the spawn_agent implementation / agent-registry wiring in cmuxlayer

Next action (for orcClaude, NOT for skill-creator):

Locate the actual rename hook source
Reproduce each bug in a safe surface
Fix behind a feature flag
A/B test against the current hook
Ship via /pr-loop

Until then: manually mcp__cmux__rename_tab + mcp__cmux__set_status after spawn, using the color/display-name conventions in the table above.

Model	Input Tokens	Output Tokens	Cost / Run	Cost / 1K Runs
Opus 4.6	5,129	2,195	$0.2416	$241.60
Sonnet 4.6	2,919	2,279	$0.0429	$42.90
Haiku 4.5	1,713	897	$0.0015	$1.50
Codex	2,500	800	$0.0285	$28.50
Gemini 2.5	2,200	700	$0.0125	$12.50
Kiro	2,000	600	$0.0156	$15.60

Model	p50	p95	Overhead
Opus 4.6	4.1s	6.5s	+56%
Sonnet 4.6	2.2s	3.7s	+65%
Haiku 4.5	1.7s	2.5s	+45%
Codex	3.2s	5.4s	+69%
Gemini 2.5	1.9s	3.1s	+63%
Kiro	2.4s	4.0s	+67%

Workflows

/cmux-agents:prompt-audit

Assertion	Opus 4.6	Sonnet 4.6	Haiku 4.5	Codex	Gemini 2.5	Kiro	Consensus
parallel-claude-spawn							5/6
cursor-audit-workflow							4/6
codex-worktree-setup							6/6
background-monitoring							5/6
gemini-research-routing							5/6
pane-recovery-reuse							5/6
t3-thread-per-surface							4/6
cli-routing-table							5/6

Lead/Orchestrator Monitor Law

ROLE GEOMETRY — leads/orchestrators land LEFT, workers land RIGHT (automatic)

Default Lifecycle (2026-04-19)

Existing Agent Reuse / Goal Supersession

Harness Command Semantics Map

Managed Health Gate

Lead Goal Contract

Lead/Orchestrator Monitor Law

ROLE GEOMETRY — leads/orchestrators land LEFT, workers land RIGHT (automatic)

Default Lifecycle (2026-04-19)

Existing Agent Reuse / Goal Supersession

Harness Command Semantics Map

Managed Health Gate

Lead Goal Contract

repoGolem Launcher Contract

Worker vs Audit Placement

Current Caveats

FR-01 — hyphenated repo names can mis-resolve to the wrong launcher

FR-06 — state parser ambiguity (booting vs idle-with-spinner)

First-run Touch ID note

Completion Signals — file > wait_for > get_agent_state > read_screen

Visual Proof / Screenshots

The file-based completion pattern (copy this)

Completion notification / lead wake requirement

FR-06 fallback when parser and screen disagree

STOP — Before Anything Else

MCP Primitives (use these for low-level ops)

Response Schemas (PR #76, 2026-04-17 — list_surfaces)

spawn_agent crash_recover + session auto-capture (PR #77, 2026-04-17)

Spawning Agents — launcher policy pointer (2026-06-05)

Spawning Agents (MANDATORY: use spawn_agent)

Auto-workspace-categorization by launcher root

NEVER Use These Commands (Common Mistakes)

Cursor /auto-run fallback

Task Routing

Agent Selection Guide

Post-Restart Truth-vs-Display (2026-06-06)

Resume-Freeze Doctrine (2026-06-07)

MCP / Daemon Recovery Authority

Worker-Brief Authoring Contract (2026-06-06)

Pre-Spawn Prompt Checklist

Prompt Size Ceiling — send_input freezes surfaces above ~2000 chars

The rule (hard cap + warn)

File-based handoff pattern (for prompts > 1,800 chars)

On a failed send_input: check the surface before retrying

Boot + deliver: FOCUS-FIRST (the reliable bundle for "send the prompt once booted")

Composer-Wedge Runtime Doctrine (2026-06-06)

Surface Health Check (MANDATORY — before ANY prompt delivery)

Parsing Agent Output

Monitoring Protocol

Inbox / Collab Subscription / Nudge Contract

Envelope-vs-Delivery Pairing (MANDATORY)

The rule

Same-turn pairing pattern (copy this)

Anti-patterns (these are the orphans)

Acceptance test

Done Signals

Collab Pattern

Git Worktree Isolation

Pane Hygiene — harvest, review, close

Closure Invariant — zero workers must mean terminal state

Essential Rules

Known Issues — cmux rename hooks (design proposal, do NOT edit without orcClaude review)

See Also

Behavior Evals

Adapter Evals

Token Usage

Cost per Run

Response Time (p50)

Response Time (p95)

Workflows

FR-06 — state parser ambiguity (`booting` vs idle-with-spinner)

Response Schemas (PR #76, 2026-04-17 — `list_surfaces`)

Spawning Agents (MANDATORY: use `spawn_agent`)

Cursor `/auto-run` fallback

Prompt Size Ceiling — `send_input` freezes surfaces above ~2000 chars