/cmux-agents
Spawn AI agents in cmux panes — Claude workers as splits, audits/research as surfaces. Covers Claude, Cursor, Gemini, Codex, Kiro. Includes monitoring, prompt delivery, and collab patterns. Use this skill whenever the user mentions cmux agents, terminal agents, split agents, multi-agent orchestration, or wants to spawn AI workers in visible terminal panes.
$ golems-cli skills install cmux-agentsUpdated 2 weeks ago
Orchestration layer for AI agents in cmux panes. Low-level pane operations (splits, reads, sends) use cmux MCP tools — this skill handles the workflow on top.
STOP — Before Anything Else
1. Source agent-functions.sh:
source ~/.claude/commands/cmux-agents/scripts/agent-functions.shThis gives you spawn-agent, agent-status, agent-nudge, agent-kill. 99% failure rate when hand-rolling cmux commands — these are NOT optional.
2. You WILL use spawn-agent for every agent. No exceptions. No hand-rolled cmux send + cd + claude.
3. AGENT_REGISTRY — maintain after CLAUDE_COUNTER in every response with active agents:
AGENT_REGISTRY:
| Surface | Tab | Task | Status | Last Check |
|---------|-----|------|--------|------------|
| surface:153 | cmux-analysis | Digest failures | WORKING | 12:35 |
Add on spawn. Update on check. Remove on kill.
MCP Primitives (use these for low-level ops)
| Operation | MCP Tool |
|---|---|
| Create split | mcp__cmux__new_split |
| Read agent output | mcp__cmux__read_screen |
| Send command/text | mcp__cmux__send_input |
| Send keystroke | mcp__cmux__send_key |
| List surfaces | mcp__cmux__list_surfaces |
| Rename tab | mcp__cmux__rename_tab |
| Set status bar | mcp__cmux__set_status |
| Set progress | mcp__cmux__set_progress |
| Close surface | mcp__cmux__close_surface |
| Open browser | mcp__cmux__browser_surface |
Use MCP tools directly for ad-hoc pane interaction. Use spawn-agent for full agent lifecycle.
Create surface via MCP, then spawn
spawn-agent '' '' [options]
**Options:** `--model sonnet|opus`, `--launcher <func>`, `--cli gemini|codex|cursor-audit|cursor-work|kiro`
**Examples:**
```bash
spawn-agent golems surface:114 'T1 search-fix' 'Fix search ranking' --model sonnet
spawn-agent golems surface:115 'T2 audit' 'Audit code quality' --cli cursor-audit
spawn-agent orchestrator surface:116 'T3 research' 'Survey patterns' --cli gemini
spawn-agent golems surface:117 'T4 refactor' 'Refactor retry' --cli codex
Multiple agents: Create ALL splits first (3s between for Touch ID), verify $ prompt on each via read_screen, THEN spawn sequentially.
Task Routing
| Need | Best CLI | Why |
|---|---|---|
| Deep reasoning, multi-file | Claude | Best reasoning, MCP, native worktrees |
| Codebase-wide audit | Cursor | @codebase indexing, text output |
| Fast structured output | Codex | Output contracts, GPT-5.4 |
| Large context research | Gemini | 1M tokens, free |
| Quick PRs | Codex | Fast, low overhead |
| Big refactors | Claude | --worktree, session resume |
| Cross-model verification | Claude + Cursor | Different blind spots |
Full CLI syntax and capabilities: adapters/ directory + adapters/capabilities.yaml.
Full SKILL.md source — includes LLM directives, anti-patterns, and technical instructions stripped from the Overview tab.
Orchestration layer for AI agents in cmux panes. Low-level pane operations (splits, reads, sends) use cmux MCP tools — this skill handles the workflow on top.
STOP — Before Anything Else
1. Source agent-functions.sh:
source ~/.claude/commands/cmux-agents/scripts/agent-functions.shThis gives you spawn-agent, agent-status, agent-nudge, agent-kill. 99% failure rate when hand-rolling cmux commands — these are NOT optional.
2. You WILL use spawn-agent for every agent. No exceptions. No hand-rolled cmux send + cd + claude.
3. AGENT_REGISTRY — maintain after CLAUDE_COUNTER in every response with active agents:
AGENT_REGISTRY:
| Surface | Tab | Task | Status | Last Check |
|---------|-----|------|--------|------------|
| surface:153 | cmux-analysis | Digest failures | WORKING | 12:35 |
Add on spawn. Update on check. Remove on kill.
MCP Primitives (use these for low-level ops)
| Operation | MCP Tool |
|---|---|
| Create split | mcp__cmux__new_split |
| Read agent output | mcp__cmux__read_screen |
| Send command/text | mcp__cmux__send_input |
| Send keystroke | mcp__cmux__send_key |
| List surfaces | mcp__cmux__list_surfaces |
| Rename tab | mcp__cmux__rename_tab |
| Set status bar | mcp__cmux__set_status |
| Set progress | mcp__cmux__set_progress |
| Close surface | mcp__cmux__close_surface |
| Open browser | mcp__cmux__browser_surface |
Use MCP tools directly for ad-hoc pane interaction. Use spawn-agent for full agent lifecycle.
Spawning Agents (MANDATORY: use spawn-agent)
source ~/.claude/commands/cmux-agents/scripts/agent-functions.sh
# Create surface via MCP, then spawn
spawn-agent <repo> <surface> '<tab-name>' '<task prompt>' [options]Options: --model sonnet|opus, --launcher <func>, --cli gemini|codex|cursor-audit|cursor-work|kiro
Examples:
spawn-agent golems surface:114 'T1 search-fix' 'Fix search ranking' --model sonnet
spawn-agent golems surface:115 'T2 audit' 'Audit code quality' --cli cursor-audit
spawn-agent orchestrator surface:116 'T3 research' 'Survey patterns' --cli gemini
spawn-agent golems surface:117 'T4 refactor' 'Refactor retry' --cli codexMultiple agents: Create ALL splits first (3s between for Touch ID), verify $ prompt on each via read_screen, THEN spawn sequentially.
Task Routing
| Need | Best CLI | Why |
|---|---|---|
| Deep reasoning, multi-file | Claude | Best reasoning, MCP, native worktrees |
| Codebase-wide audit | Cursor | @codebase indexing, text output |
| Fast structured output | Codex | Output contracts, GPT-5.4 |
| Large context research | Gemini | 1M tokens, free |
| Quick PRs | Codex | Fast, low overhead |
| Big refactors | Claude | --worktree, session resume |
| Cross-model verification | Claude + Cursor | Different blind spots |
Full CLI syntax and capabilities: adapters/ directory + adapters/capabilities.yaml.
Pre-Spawn Prompt Checklist
Every agent task prompt MUST include:
- Max output length ("2-3 sentences" / "under 200 words")
- Output format ("use this template: ...")
- Audience — internal or client-facing (agents leak investigation details if not told)
- What NOT to include — internal deliberation, false alarms
- Done signal ("end with DONE_SIGNAL_NAME on its own line, right before CLAUDE_COUNTER")
Full audit: workflows/prompt-audit.md
Monitoring Protocol
"I'll monitor them" is NOT monitoring. Monitoring = code on a schedule.
After spawning:
- Update AGENT_REGISTRY
- Run
agent-status surface:Nfor each to verify boot (within 15s) - For >2 agents or AFK: set up polling every 3-5 min
- Check agents at natural milestones in YOUR work, or every ~5 min idle
- When agent finishes, read output IMMEDIATELY — don't wait for user to ask
Anti-patterns:
- Fire and forget — spawning without checking. You WILL miss failures.
- Checking only when user asks — agent may have been stuck 20 min.
- Not reading output files — agent finished, you never read results.
- Using Task tool when user says "cmux agents" — Task agents are invisible. cmux agents are VISIBLE. The user wants to SEE them.
Done Signals
Instruct agents to put the signal as the very last line before CLAUDE_COUNTER — not buried above a summary. Otherwise read_screen won't catch it.
Collab Pattern
Copy ~/Gits/orchestrator/collab/TEMPLATE.md first — never write a collab from scratch. The collab-guard.py hook WILL block you.
# 1. Write collab file from template
# 2. Spawn agents with collab instructions
for entry in "search:Agent1" "perf:Agent2" "security:Agent3"; do
angle="${entry%%:*}"; name="${entry#*:}"
# Create split via MCP, get surface ID
spawn-agent TARGET_REPO "$SURFACE" "$name" \
"Read collab/FILE.md — you are $name. Claim $angle. Update collab when done." --model sonnet
sleep 3
doneLog every action in collab: spawns, completions, blockers. No silent work.
Git Worktree Isolation
Parallel agents MUST use separate worktrees — without them, agents clobber each other's git state.
- Claude subagents:
Agent(isolation="worktree")(built-in) - cmux agents: Create worktrees manually before spawning, then
spawn-agentinto the worktree dir. Seeworktreesskill for details.
Essential Rules
- spawn-agent for every agent — no hand-rolled cmux commands
- Discover surfaces before acting — use
list_surfacesMCP tool - NEVER read_screen your own surface — recursive output
- Verify after launch — read_screen within 15s
- Workers = right splits, audits = down splits
- Sequential launch — 3s between spawns for biometric
- Cross-workspace: always pass workspace ref — surfaces are workspace-scoped
- Name everything — rename_tab on every surface
- Update collab after every action — spawns, completions, blockers
- AGENT_REGISTRY is mandatory — maintains state across compaction
- BrainLayer agents are sequential — stagger by 10s+ (SQLite = single writer)
- Polling on spawn, not on request — set up monitoring immediately
- Don't give hour-scale estimates for <500-line tasks — parallel agents are 8-10x faster
Best Pass Rate
100%
Opus 4.6
Assertions
8
6 models tested
Avg Cost / Run
$0.0571
across models
Fastest (p50)
1.7s
Haiku 4.5
Behavior Evals
estimated — behavior eval not yet runBehavior Baseline
Adapter Evals
Phase 2C — cross-AI portabilityAdapter Portability
| Assertion | Opus 4.6 | Sonnet 4.6 | Haiku 4.5 | Codex | Gemini 2.5 | Kiro | Consensus |
|---|---|---|---|---|---|---|---|
| parallel-claude-spawn | 5/6 | ||||||
| cursor-audit-workflow | 4/6 | ||||||
| codex-worktree-setup | 6/6 | ||||||
| background-monitoring | 5/6 | ||||||
| gemini-research-routing | 5/6 | ||||||
| pane-recovery-reuse | 5/6 | ||||||
| t3-thread-per-surface | 4/6 | ||||||
| cli-routing-table | 5/6 |
Token Usage
Cost per Run
| Model | Input Tokens | Output Tokens | Cost / Run | Cost / 1K Runs |
|---|---|---|---|---|
| Opus 4.6 | 5,129 | 2,195 | $0.2416 | $241.60 |
| Sonnet 4.6 | 2,919 | 2,279 | $0.0429 | $42.90 |
| Haiku 4.5 | 1,713 | 897 | $0.0015 | $1.50 |
| Codex | 2,500 | 800 | $0.0285 | $28.50 |
| Gemini 2.5 | 2,200 | 700 | $0.0125 | $12.50 |
| Kiro | 2,000 | 600 | $0.0156 | $15.60 |
Response Time (p50)
Response Time (p95)
| Model | p50 | p95 | Overhead |
|---|---|---|---|
| Opus 4.6 | 4.1s | 6.5s | +56% |
| Sonnet 4.6 | 2.2s | 3.7s | +65% |
| Haiku 4.5 | 1.7s | 2.5s | +45% |
| Codex | 3.2s | 5.4s | +69% |
| Gemini 2.5 | 1.9s | 3.1s | +63% |
| Kiro | 2.4s | 4.0s | +67% |
Last evaluated: 2026-03-12 · Real Phase 2C adapter eval · behavior section estimated · 3 CLIs tested
Changelog entries are derived from eval runs and skill version updates. Full cascading changelog (Phase 4D) coming soon.
Best Pass Rate
100%
Assertions
8
Models Tested
6
Evals Run
9
- +Initial release to Golems skill library
- +23 assertions across 9 eval scenarios
- +1 workflow included: prompt-audit
- +Eval fixtures included