Engineering Journey
How one developer and Claude Code built an autonomous AI agent ecosystem — and accidentally invented patterns before the platforms shipped them natively.
Origins
Pre-2026: Ralph — The Autonomous Coding Loop
Before the Golems ecosystem existed, there was Ralph — an autonomous AI coding loop that reads PRD stories and implements them one by one.
Ralph started as a zsh script that spawns fresh Claude sessions in a loop:
while stories remain:
spawn fresh AI → read PRD → implement story → review → commit
done
Then came ralph-ui — a React Ink terminal dashboard built with node-pty. A real-time CLI interface showing progress bars, story boxes, CodeRabbit review status, iteration headers, and PTY output from the running Claude session. Components like AliveIndicator, HangingWarning, and RetryCountdown handled the reality of autonomous AI: sometimes it hangs, sometimes it fails, and it needs to retry.
Ralph proved that AI could work autonomously on structured tasks. The question became: what if we applied this pattern to everything — email, jobs, outreach, finances? That's where Golems began.
Jan 11, 2026: Memory First
The project started with a question: what if AI agents could remember?
Instead of building golems first, we built Zikaron — a memory layer using sqlite-vec and bge-large-en-v1.5 embeddings. The insight: memory enables everything else. Without it, every agent session starts from zero.
Key decisions:
- sqlite-vec over ChromaDB (stable, zero-dependency, local-first)
- bge-large-en-v1.5 for embeddings (best Hebrew+English support, 1024 dims, MIT license)
- Python daemon with FastAPI (fast iteration, existing ML ecosystem)
- Hybrid search: BM25 + semantic for both keyword and conceptual recall
"AI is not a tool you use, but a capacity you schedule. Distribute cognition like compute: allocate it, queue it, keep it hot." — Boris Cherny, Claude Code creator
Jan 13: Architecture Crystallizes
Chose monolithic Python daemon over microservices. One process, one database, instant queries. Zikaron now indexes 238K+ conversation chunks and returns results in under 2 seconds.
Jan 17: First Golem — Email Router
The EmailGolem was born: a Gmail poller that classifies incoming email by domain, scores relevance with Ollama, and routes to the right golem.
The key insight that shaped everything:
"Golems = domain experts, not I/O channels."
An EmailGolem doesn't "do email." It's a triage specialist that happens to receive email as input. A RecruiterGolem doesn't "use LinkedIn." It's an outreach strategist that uses whatever channels reach the target.
Jan 20: Job Scraper
JobGolem scraped SecretTLV, Goozali, and Drushim for Israeli tech jobs. Built an Elo rating system for match quality. Hit rate limiting immediately — learned to add tiered prefiltering (title → requirements → LLM) to reduce API calls.
Jan 25: Cloud Migration Strategy
Decision: Hybrid architecture instead of full cloud.
Mac (Brain) Railway (Body)
├── Telegram bot ├── Email poller
├── Night Shift ├── Job scraper
├── Notifications └── Briefing generator
└── Ollama (local LLM)
The Mac makes decisions. The cloud collects data. Supabase sits in between as the shared state layer.
Multi-Agent Emergence
Jan 26: Async Collaboration Protocol
Built a file-based inter-session communication protocol before anyone else had one:
## From: golem-session @ 2026-01-26 01:35
**Topic:** Integrating Zikaron Active Learning
Hey farther-steps Claude! I'm working on MP-128...
## From: farther-steps-session @ 2026-01-26 11:45
**Re:** Integrating Zikaron Active Learning
Hey! Just finished documenting farther-steps...Rules: Append-only, timestamped, session-attributed. No overwrites. Close when consensus reached.
This predates Anthropic's native Agent Teams by 11 days.
Jan 26 – Feb 6: Wave System & Personality Emergence
Ran 8 waves of async collaboration with named agent personalities. Something unexpected happened — the names shaped the behavior:
| Wave | Focus | Agents | Emergent Behavior |
|---|---|---|---|
| 3 | Async Collaboration | CadenceClaude, OutreachGuru, ProfileArchitect | CadenceClaude started using temporal metaphors naturally |
| 5 | Filtering | Promptis, Scout, Velocity | Scout developed a cautious, thorough style |
| 6 | Sources | Hunter, SourceHunter, Watchman | Watchman became vigilant, monitoring-focused |
| 7 | Verification | PipelinePete, PixelPolice, SchemaScout | PixelPolice obsessively checked visual details |
| 8 | Final Verify | StatusVerifier, GPT-5.2 Codex | Cross-model verification |
"Personalities emerged organically. We didn't program them — giving agents names and roles made them develop distinct communication styles."
Jan 30: Interview Practice System
Discovered Cantaloupe AI's approach to interview coaching. Built 7 interview modes with Elo tracking: Leetcode, System Design, Debugging, Code Review, Behavioral-Technical, Optimization, Complexity.
Building the Ecosystem
Feb 1: Persistent Sessions
Solved the "fresh context" problem: use Claude Code's --resume flag per-golem. NightShift remembers what it built yesterday. RecruiterGolem remembers which companies it already researched.
Feb 2: Monorepo Consolidation
Three-Claude merge brought everything under one roof:
packages/autonomous/— All golems, Telegram bot, Night Shiftpackages/ralph/— Autonomous coding loop (PRD-driven)packages/zikaron/— Memory layer (Python + sqlite-vec)
Three parallel Claude sessions coordinated via the collab protocol. Audit trail: 745 lines.
"Agents must check back MULTIPLE times, not just dump and leave. React to each other — this is collaboration, not parallel dumping."
Feb 2: Zikaron Proves Itself
The async collaboration protocol from Jan 26 was needed again. Instead of manually finding it:
zikaron search "collaborative claudes parallel sessions coordination"
# Found in ~2s, score: 0.715
# Rediscovered claude-collab.md automaticallyKnowledge created 7 days earlier was instantly retrievable — without explicit tagging. The memory layer works.
Feb 3-4: RecruiterGolem Priority Shift
Job hunting became urgent. RecruiterGolem moved to #1 priority.
The 80/20 insight: 80% of the job hunt is networking/outreach, 20% is applications. The hidden job market is 80% — most jobs are never posted.
Contact discovery strategy:
- GitHub org → top contributors → free emails
- LinkedIn → Claude in Chrome scrape
- Hunter.io → domain email pattern matching (50 free/month)
- Lusha → direct lookups (5 free/month, save for high-value targets)
Feb 5: Content Pipeline Architecture
Designed a multi-model content pipeline:
Research (Cursor CLI) → Draft (Claude) → Verify (Cursor) → Approve → Post
Cursor's @codebase semantic indexing finds related code without exact matches — faster and cheaper than Claude for bulk research.
Feb 5-6: Anthropic Ships Native Agent Teams
Anthropic released Agent Teams (v2.1.32+):
- Parallel agents in tmux sessions
Shift+Up/Downto switch between teammates- Memory frontmatter scoping
What we had that native didn't:
| Capability | Our System | Native Teams |
|---|---|---|
| Parallel execution | Task spawning | tmux |
| Agent personalities | Named + role-based | None |
| Consensus protocols | 20-pass verification | None |
| Async file communication | claude-collab.md | None |
| Wave iteration | Retry on failure | None |
| Audit trail | tracker.md + round-N | None |
| Identity emergence | Organic from naming | None |
Strategic decision: Keep critique-waves protocol, enhance with native hooks.
Rapid Build Phase
Feb 6-7: Four Phases in 48 Hours
Built the full ecosystem in a concentrated sprint:
Phase 1 — Ship What's Built: 8 bug fixes, email routing, reply drafting, follow-up tracking, shared types, agent-runner.ts. 333 tests passing at this point.
Phase 2 — Cloud Offload: Mac = brain, Railway = body. Supabase migration (8 new tables), Dockerfile, dual-mode notification sender, state store abstraction. Cost tracking: Haiku 4.5 at $0.80/MTok input.
Phase 3 — TellerGolem: Tax categorization by IRS Schedule C category, payment failure alerts via Telegram, monthly/annual expense reports. 29 new tests (TDD).
Phase 4 — Tooling: Helpers layer (rate-limited API wrappers), DeepSource static analysis, skills catalog CLI, plugin architecture, session forking, Playwright E2E scaffold.
Sprint count: 400+ tests at the time, 35 plan items completed, 3 MCP servers. (Post-Phase 8 componentization: 1,179 tests, 4,056 assertions across 10 packages.)
Feb 7: Distribution Strategy
Designed a three-tier distribution model:
Tier 1 — Easy: Install MCP servers, run golems setup. Job scraping, email routing, notifications work out of the box.
Tier 2 — Power User: Feed your communication data to Zikaron, get a personalized style card. Customized golem personas, personalized outreach voice.
Tier 3 — Developer: Custom skills, new golems, modified contexts. Contribute back to the framework.
"Ship the skeleton, keep the soul local."
Feb 7: Public vs Local Split
Scrubbed personal data from the public repo. What ships: example contexts, MCP servers, skills framework, golems setup wizard, Docusaurus docs. What stays local: planning docs, style card, job preferences, Zikaron database, communication archives.
Critique Waves: The Consensus Engine
The most novel pattern — a debate protocol for multi-agent correctness:
Setup → instructions.md + tracker.md
↓
Launch Wave (3 parallel agents)
↓
Each agent writes to round-N-agent-X.md
↓
Tally: ALL PASS → increment | ANY FAIL → reset to 0
↓
Goal: 20+ consecutive passes = consensusThis isn't task distribution — it's verification through independent agreement. Native Agent Teams splits work and merges. Critique Waves verifies that multiple agents independently reach the same conclusion.
The Stack
Built entirely with:
- Bun — Runtime + test runner + bundler
- Claude Code — Primary development tool (Opus 4.5/4.6)
- Supabase — PostgreSQL + auth + RLS
- Railway — Cloud deployment
- Grammy — Telegram bot framework
- sqlite-vec — Local vector search (Zikaron)
- Next.js — Documentation site (etanheyman.com/golems)
Feb 7: Zikaron sqlite-vec Migration
ChromaDB was too slow for real-time search (30s cold start). Migrated to sqlite-vec with APSW — search dropped to under 2 seconds. bge-large-en-v1.5 embeddings (1024 dims) with MPS acceleration on Apple Silicon. The daemon architecture (/tmp/zikaron.sock) keeps the model hot.
Feb 7: TellerGolem — Tax Season Prep
With tax season approaching, TellerGolem was born: IRS Schedule C expense categorization via LLM, payment failure detection (regex + LLM confirmation), monthly and annual reports. Integrated into the email router — subscription emails automatically get categorized and tracked.
Feb 7: Docsite Launch
Documentation site with an alchemical workshop theme (ember/obsidian palette), later ported from Docusaurus to Next.js at etanheyman.com/golems. Interactive terminal hero showcasing all golems, Telegram mock showing real notification flows, Mermaid architecture diagrams. Built with help from 5 CLI agents running in parallel (Gemini, Cursor, Codex, Kiro, Haiku).
Feb 7: Claude Cowork Research
Researched Claude Cowork (web interface) as a distribution channel. Key findings:
- Claude Code plugin FIRST — build Golems as CC plugin, same structure converts to Cowork later
- Cowork limitations: No daemon management, single-session, no process spawning
- Verdict: NOT viable as full plugin. Read-only dashboard OK (show status, recent jobs). "The README is the API" — any Claude can operate the golems CLI
- Non-technical users: Ask their Claude "update my golems" — Claude reads the CHANGELOG and walks them through setup in natural language
The CLAUDE.md files ARE the docs for both humans and Claude agents. This means any Claude — Code, Cowork, or API — can operate Golems by reading the repo.
Feb 7: Epoch 2 — The v2 Plan System
With the foundation built, we created a folder-based planning system — each phase gets its own folder with a README.md (plan steps) and findings.md (research results, cross-phase knowledge).
26 phases planned across the full vision:
| Phase | What | Status |
|---|---|---|
| Hero v2 | Tab pop animation, logo in terminal, Telegram bidirectional sync, dynamic 3rd button | Done |
| Character Research | Authentic golem identity from Jewish folklore — Prague Golem, Emet/Met, clay creatures | Done |
| React Ink TUI | Terminal dashboard with GolemCard components, expandable "trailers" per golem | Done |
| Content + Claude 4.6 | Research pipeline, Claude 4.6 capabilities audit | Done |
| README + Privacy | Scrub personal data, improve public-facing docs | Done |
| Unified Shell | Centralized shell system across all golem sessions | Done |
| etanheyman.com Integration | Supabase migration, slug URLs, docs link | Done |
| Per-Repo Sessions | Each repo gets its own persistent Claude session | Done |
| Teaching Vision | Design doc + memory folders for guided CLI | Done |
| Admin Dashboard | TypeScript fix, partial wiring (jobs live) | Done |
| Security Hardening | Gibberish detection, RLS, headers, API audit | Done |
| Test Maintenance | Isolated test runner for bun env pollution | Done |
| NightShift Upgrade | Self-healing agent patterns, smarter retry | Planned |
| Wizard | Interactive golems setup installer | Planned |
/large-plan Skill | Extract this planning system into a reusable skill | Planned |
Key pattern discovered: The findings.md files in each phase folder ARE the async collaboration layer. Agents write research to them, and cross-phase routing in the main README connects knowledge across phases. This is the same collab protocol from Jan 26, formalized.
Feb 7: Golem Trailers — Show Don't Tell
Each golem tab in the docsite terminal hero now shows a real action demo instead of generic status lines:
- ClaudeGolem:
$ claude -c --resume— context-loaded session, Zikaron memory - EmailGolem:
$ golems email --triage— inbox scan, category routing, draft replies - RecruiterGolem:
$ golems recruit --find— Exa search, scoring, outreach drafting, interview practice - TellerGolem:
$ golems teller --briefing— spend tracking, category breakdown, tax deductions - JobGolem:
$ golems jobs --matches— fit scoring, auto-apply tracking
The concept: every golem tab is a "trailer" showing what it actually does, not a dashboard of numbers.
Phase 8: The Componentization (Feb 8–11)
The monolith that worked needed to become a plugin ecosystem that scales.
The Problem
Everything lived in packages/autonomous/ — a single package with all golems, all services, all infrastructure. It worked, but:
- Couldn't install a single golem as a Claude Code plugin
- Tight coupling between golems made changes risky
- No clear boundaries for contributors
- Test failures in one area blocked everything
The Solution: 9-Phase Migration
Planned and executed a 9-phase componentization with a strict policy: if anything breaks or feels wrong, stop and notify on Telegram. No improvising through blockers.
| Phase | What Happened | Tests After |
|---|---|---|
| 1. Extract Shared | Created @golems/shared — Supabase, LLM, email, state, notifications. 12 files moved. | 890 pass |
| 2. Decouple | Broke all cross-golem imports. Added getStatus() to every golem. Zero cross-golem business logic imports. | 890 pass |
| 3. Thin Router | Telegram bot from 1,957 lines to 97. Each golem gets its own Grammy Composer. | 890 pass |
| 4. Bun Workspaces | Created 8 package scaffolds. Moved ~80 files via git mv. 90 strangler wrappers for backward compat. | 862 pass |
| 5. CC Plugins | Created plugin.json for 7 packages. Wrote CLAUDE.md per golem. 16 skills, 3 agents. | — |
| 6. CoachGolem | Brand new golem: Google Calendar sync, daily planning, ecosystem status aggregation. 15 tests. | +15 pass |
| 7. Services | Cloud Worker, Night Shift, Briefing moved to @golems/services. Root Dockerfile for Railway workspace. | — |
| 8. Infrastructure | Launchd plists updated. load-env.ts made workspace-aware. Pre-commit hook fixed. | — |
| 9. Distribution | npm metadata on all packages. READMEs per package. Root CLAUDE.md rewritten. | 1,179 pass |
Key Decisions
Strangler wrappers over big bang. packages/autonomous/ kept 1-line re-exports so nothing broke during migration. Tests still pass through the old paths. Zero downtime — the Telegram bot stayed live through all 9 phases.
EmailGolem dissolved into shared. Email is infrastructure (polling, scoring, routing), not domain expertise. The email subsystem lives in @golems/shared/email/ — every golem can receive emails routed to it.
CLI agents as research assistants. Cursor with @codebase did the bulk file-move planning (282-line manifest). Gemini handled web research. Each phase step was tagged with its executor: [Opus], [Cursor work], [Gemini research], [context7], [bun test], [manual].
The Result
Before: 1 package, ~890 tests, tightly coupled
After: 10 packages, 1,179 tests, each golem independently installable
6 golems (Claude orchestrator + Recruiter, Teller, Job, Coach, Content domain experts), plus @golems/shared (including the Email system), @golems/services, Ralph, and Zikaron.
What's Next
Immediate
- Deploy updated cloud worker to Railway
- 24h smoke test of all launchd plists
- Telegram command verification (all composers)
Medium-term
- NightShift self-healing (retry strategies, hang detection)
- ContentGolem migration (move logic from skills into
src/) - Teaching mode — CLI that explains what it's doing and why
- Axiom observability + cost tracking
Long-term
- Plugin marketplace for Claude Code extensions
- MCP server distribution (works in Zed, Cursor, VS Code)
- Mobile dashboard (Expo + React Native)
/large-planskill — formalize async collab planning into a reusable pattern
Built by Etan Heyman with Claude Code.