Loading...
VoiceLayer adds bidirectional voice to AI coding agents via the Model Context Protocol. It provides 5 voice modes (announce, brief, consult, converse, think) for different interaction patterns — from fire-and-forget status updates to full voice Q&A with local speech-to-text. Uses edge-tts for neural text-to-speech and whisper.cpp for local transcription (~300ms on Apple Silicon). Session booking prevents mic conflicts between parallel Claude sessions. Everything runs locally with zero cloud APIs.
voice_speak and voice_ask cover the full range: fire-and-forget TTS to interactive Q&A, with automatic mode detection. VoiceBar daemon (renamed from FlowBar) handles both directions.
whisper.cpp and Wispr Flow backends at ~300ms. No cloud APIs, no data leaving your machine.
Speech
User voice input
STT
whisper.cpp ~300ms
Voice Tools
2 tools, auto detection
Session Mgr
Lockfile mutex
TTS Output
edge-tts neural
Speech
User voice input
STT
whisper.cpp ~300ms
Voice Tools
2 tools, auto detection
Session Mgr
Lockfile mutex
TTS Output
edge-tts neural
bunx voicelayer-mcpTyping every interaction with AI coding agents felt wrong. QA testing, code review, and design discussions should be conversations — not typing marathons. Existing voice platforms charge per-minute and send data to the cloud.
Designed 5 distinct modes for different moments: announce (fire-and-forget status), brief (agent reads back findings), consult (checkpoint before action), converse (full bidirectional Q&A), and think (silent notes to markdown).
Speech-to-text runs locally using whisper.cpp with CoreML/Metal acceleration — transcription in ~200-400ms on Apple Silicon. No cloud APIs, no per-minute billing, no data leaving the machine.
Lockfile-based mutex prevents mic conflicts. Only one voice session at a time — other Claude sessions see "line busy" and fall back to text. Stale locks from dead processes are auto-cleaned.
75 tests with 178 assertions. 7 MCP tools (5 modes + 2 aliases). Full CI pipeline, branch protection, TypeScript strict mode. Docs site live at etanhey.github.io/voicelayer.
11 MCP tools, 312K+ indexed chunks, hybrid semantic+keyword search, knowledge graph with entity resolution, local LLM enrichment via Groq/MLX/Ollama. pip install brainlayer.
Autonomous AI agent ecosystem — 12 Bun workspace packages, 7 domain agents, 60+ AI-agnostic skills, multi-LLM routing, Night Shift autonomous coding at 4am. 1,073 tests.
Singleton voice service via socat with dual-protocol support (NDJSON + MCP Content-Length). Auto-starts via macOS LaunchAgent. Always available, zero manual setup after install.
Neural-quality TTS with word-boundary text splitting for long messages. Auto-chunks at sentence boundaries to avoid truncation. Free, local, multiple voices.