Architecture

How BrainLayer — Persistent Memory for AI Agents works

Data Pipeline

Every Claude Code session produces JSONL transcripts. BrainLayer's 4-stage pipeline turns these into searchable knowledge. Extract parses session files and detects continuation chains. Classify identifies content types (user messages, AI code, stack traces, file reads) with content-aware length thresholds. Chunk splits text using tree-sitter for code and paragraph boundaries for prose, targeting ~2000 chars per chunk. Embed generates 1024-dim vectors with bge-large-en-v1.5 and stores them in SQLite via sqlite-vec.

Extract

JSONL → sessions

Classify

Content-type detection

Chunk

AST-aware splitting

Embed

bge-large 1024-dim

Store

SQLite + sqlite-vec

Extract

JSONL → sessions

Classify

Content-type detection

Chunk

AST-aware splitting

Embed

bge-large 1024-dim

Store

SQLite + sqlite-vec

Hybrid Search

Vector similarity alone misses exact keyword matches. BrainLayer runs two strategies in parallel: semantic search with 1024-dim embeddings (KNN, 3x oversampling) and FTS5 keyword search for exact hits. Reciprocal Rank Fusion combines both ranked lists with score = 1/(k + rank), where k=60 keeps any single high rank from dominating the final ordering.

python

# Reciprocal Rank Fusion (k=60)
for chunk_id in all_results:
    score = 0.0
    if chunk_id in semantic_results:
        score += 1.0 / (60 + semantic_rank)
    if chunk_id in fts_results:
        score += 1.0 / (60 + fts_rank)
    fused[chunk_id] = score

return sorted(fused, reverse=True)[:n]

Results appearing in both lists get higher scores

Enrichment Pipeline

Raw chunks need structure. A local LLM (GLM-4.7-Flash or Qwen2.5-Coder-14B via MLX on Apple Silicon) enriches each chunk with 10 metadata fields: summary, tags, importance (1-10), intent, primary code symbols, a hypothetical query for HyDE retrieval, epistemic level, version scope, tech debt impact, and external deps. Batches of 50-100 chunks with 5-minute stall detection per chunk.

Why local LLM?

313K+ chunks at ~$0.01/chunk via cloud API = $3,139. Local GLM-4.7-Flash costs $0. Quality is comparable for structured extraction tasks, and no data leaves the machine.

Why SQLite

Everything lives in a single .db file: SQLite + sqlite-vec for vectors, FTS5 for keywords. Not a compromise. The database ships with the package, needs zero infrastructure, and handles concurrent access from the daemon, MCP server, and enrichment workers via APSW with a 5-second busy timeout.

	BrainLayer	pgvector	Pinecone	Chroma
Deployment	pip install	PostgreSQL server	Cloud SaaS	Docker/Cloud
Cost	Free	Server costs	$0.04/1M vec/mo	Free (OSS)
Offline	Yes	Needs DB conn	No	Needs server
Portability	Copy .db file	pg_dump	N/A	Export
Hybrid search	FTS5 built-in	Separate plugin	Vector only	Limited

MCP Integration

10 MCP tools expose BrainLayer's full capability to any Claude Code session. 3 core memory tools handle search, persistence, and recall. 7 knowledge graph and lifecycle tools add entity extraction, digestion with 3 modes, and real-time pubsub. Started at 14 specialized tools, refined to 10 that cover every use case. BrainBar daemon provides MCP over Unix socket for always-on access.

brain_searchSemantic + keyword hybrid search across all indexed chunks

brain_storePersist decisions, learnings, bugs, and TODOs with auto-tagging

brain_recallSession context, operational history, and work summaries

brain_digest3-mode ingestion: full content, faceted tags (Gemini 2.5 Flash), tiered selectivity (T0-T3)

brain_entityLook up known entities in the knowledge graph with relations

brain_updateUpdate, archive, or merge existing memory chunks

brain_expandDrill into search results with surrounding context from the same session

brain_tagsDiscover, search, and suggest tags across the knowledge base

brain_subscribe / brain_unsubscribePubsub for real-time memory update notifications across sessions