Architecture

How Golems - Autonomous AI Agents works

Monorepo Structure

12 packages in a Bun monorepo. @golems/shared is the foundation: Supabase client, multi-backend LLM routing, email processing, state management. Domain golems (jobs, recruiter, coach, teller, content) are self-contained Claude Code plugins. 60+ AI-agnostic skills with eval framework. The dashboard is a Next.js app with 2D canvas knowledge graph and Neural Observatory. Each package deploys independently but shares types and utilities through the foundation layer.

Foundation

@golems/shared

SupabaseLLM RouterEmailState

Domain

7 golems

JobsRecruiterCoachTellerContent

Skills

60+ AI-agnostic

SKILL.mdAdaptersEvals

Dashboard

KG Canvas + Observatory

Grammy bot

Foundation

@golems/shared

SupabaseLLM RouterEmailState

Domain

7 golems

JobsRecruiterCoachTellerContent

Skills

60+ AI-agnostic

SKILL.mdAdaptersEvals

Dashboard

KG Canvas + Observatory

Grammy bot

Cloud + Local Split

Two environments, each tuned for its workload. Railway runs the cloud worker: scheduled cron tasks for email polling (hourly), job scraping (3x/day Sun-Thu), daily briefings, and content learning. macOS handles real-time services: Telegram bot (Grammy, port 3847), BrainLayer indexing, VoiceLayer I/O, and Night Shift autonomous coding at 4am via launchd.

	Railway (Cloud)	macOS (Local)
Workload	Scheduled cron tasks	Real-time services
LLM Backend	Gemini Flash-Lite (free)	MLX / GLM (free) → Haiku (paid)
State	Supabase	Local files
Services	Email, Jobs, Briefing	Telegram, Memory, Voice, Night Shift
Cost	~$5/mo Railway	$0 (Mac Mini)

Multi-LLM Routing

Every LLM call goes through a routing layer that prefers free models. The hierarchy: MLX on Apple Silicon (21-87% faster than Ollama, $0), local GLM-4.7-Flash via Ollama ($0), Gemini 2.5 Flash-Lite (free tier, 1K RPD), Groq Llama 4 Scout (free tier), then Claude Haiku 4.5 (paid, last resort). Same runLLM() interface everywhere. Backend selection is just an env var.

typescript

// Same interface, any backend
import { runLLM } from "@golems/shared/lib/llm";

const result = await runLLM(prompt);
// Routes based on LLM_BACKEND env var:
// "mlx" → local Apple Silicon (fastest)
// "glm" → local Ollama (free)
// "gemini" → cloud free tier
// "haiku" → paid fallback

Consumer code is identical regardless of LLM backend

Autonomous Loop

Ralph turns PRD stories into working code without human intervention. PR Loop v2 enforces CodeRabbit AI review on every commit. Failed reviews trigger automatic fix-iterate-review cycles (max 3 attempts). Night Shift extends this at 4am: scans repos for TODOs, creates worktrees, ships PRs while the developer sleeps. OrcClaude v2.0 coordinates multi-agent sprints with planning topology and structured response markers.

PRD

Stories + criteria

OrcClaude

Coordinate agents

Implement

Parallel workers

CodeRabbit

AI review gate

PR Loop v2

Review-enforced

PRD

Stories + criteria

OrcClaude

Coordinate agents

Implement

Parallel workers

CodeRabbit

AI review gate

PR Loop v2

Review-enforced

CodeRabbit as quality gate

Every autonomous commit must pass AI code review first. If CodeRabbit finds issues, Ralph fixes them automatically. If the fix fails after 3 attempts, it creates a BUG story instead of shipping broken code.

MCP Ecosystem

8 MCP servers powering every golem. BrainLayer: 12 tools (3 core memory + 9 knowledge graph/lifecycle) with BrainBar daemon. Email: 7 tools for triage. VoiceLayer: 2 voice tools with MCP daemon. Plus Supabase for database, Exa for web search, Sophtron for financial data, GLM for local inference. Each golem declares which MCP servers it needs via context profiles.

BrainLayer

11 tools + daemon

7 tools

VoiceLayer

2 tools + daemon

Supabase

SQL + DDL

Others

25+ tools

ExaSophtronGLMJobs