Loading...
How BrainLayer — Persistent Memory for AI Agents works
Every Claude Code session produces JSONL transcripts. BrainLayer's 4-stage pipeline turns these into searchable knowledge. Extract parses session files and detects continuation chains. Classify identifies content types (user messages, AI code, stack traces, file reads) with content-aware length thresholds. Chunk splits text using tree-sitter for code and paragraph boundaries for prose, targeting ~2000 chars per chunk. Embed generates 1024-dim vectors with bge-large-en-v1.5 and stores them in SQLite via sqlite-vec.
Extract
JSONL → sessions
Classify
Content-type detection
Chunk
AST-aware splitting
Embed
bge-large 1024-dim
Store
SQLite + sqlite-vec
Extract
JSONL → sessions
Classify
Content-type detection
Chunk
AST-aware splitting
Embed
bge-large 1024-dim
Store
SQLite + sqlite-vec
Vector similarity alone misses exact keyword matches. BrainLayer runs two strategies in parallel: semantic search with 1024-dim embeddings (KNN, 3x oversampling) and FTS5 keyword search for exact hits. Reciprocal Rank Fusion combines both ranked lists with score = 1/(k + rank), where k=60 keeps any single high rank from dominating the final ordering.
# Reciprocal Rank Fusion (k=60)
for chunk_id in all_results:
score = 0.0
if chunk_id in semantic_results:
score += 1.0 / (60 + semantic_rank)
if chunk_id in fts_results:
score += 1.0 / (60 + fts_rank)
fused[chunk_id] = score
return sorted(fused, reverse=True)[:n]Results appearing in both lists get higher scores
Raw chunks need structure. A local LLM (GLM-4.7-Flash or Qwen2.5-Coder-14B via MLX on Apple Silicon) enriches each chunk with 10 metadata fields: summary, tags, importance (1-10), intent, primary code symbols, a hypothetical query for HyDE retrieval, epistemic level, version scope, tech debt impact, and external deps. Batches of 50-100 chunks with 5-minute stall detection per chunk.
313K+ chunks at ~$0.01/chunk via cloud API = $3,139. Local GLM-4.7-Flash costs $0. Quality is comparable for structured extraction tasks, and no data leaves the machine.
Everything lives in a single .db file: SQLite + sqlite-vec for vectors, FTS5 for keywords. Not a compromise. The database ships with the package, needs zero infrastructure, and handles concurrent access from the daemon, MCP server, and enrichment workers via APSW with a 5-second busy timeout.
10 MCP tools expose BrainLayer's full capability to any Claude Code session. 3 core memory tools handle search, persistence, and recall. 7 knowledge graph and lifecycle tools add entity extraction, digestion with 3 modes, and real-time pubsub. Started at 14 specialized tools, refined to 10 that cover every use case. BrainBar daemon provides MCP over Unix socket for always-on access.
brain_searchSemantic + keyword hybrid search across all indexed chunksbrain_storePersist decisions, learnings, bugs, and TODOs with auto-taggingbrain_recallSession context, operational history, and work summariesbrain_digest3-mode ingestion: full content, faceted tags (Gemini 2.5 Flash), tiered selectivity (T0-T3)brain_entityLook up known entities in the knowledge graph with relationsbrain_updateUpdate, archive, or merge existing memory chunksbrain_expandDrill into search results with surrounding context from the same sessionbrain_tagsDiscover, search, and suggest tags across the knowledge basebrain_subscribe / brain_unsubscribePubsub for real-time memory update notifications across sessions