VoiceLayer — Voice I/O for AI Coding Agents

VoiceLayer adds bidirectional voice to AI coding agents via the Model Context Protocol. It provides 5 voice modes (announce, brief, consult, converse, think) for different interaction patterns — from fire-and-forget status updates to full voice Q&A with local speech-to-text. Uses edge-tts for neural text-to-speech and whisper.cpp for local transcription (~300ms on Apple Silicon). Session booking prevents mic conflicts between parallel Claude sessions. Everything runs locally with zero cloud APIs.

Github link Docs

Project journey

The Problem

Typing every interaction with AI coding agents felt wrong. QA testing, code review, and design discussions should be conversations — not typing marathons. Existing voice platforms charge per-minute and send data to the cloud.

5 Voice Modes

Designed 5 distinct modes for different moments: announce (fire-and-forget status), brief (agent reads back findings), consult (checkpoint before action), converse (full bidirectional Q&A), and think (silent notes to markdown).

Local STT via whisper.cpp

Speech-to-text runs locally using whisper.cpp with CoreML/Metal acceleration — transcription in ~200-400ms on Apple Silicon. No cloud APIs, no per-minute billing, no data leaving the machine.

Session Booking

Lockfile-based mutex prevents mic conflicts. Only one voice session at a time — other Claude sessions see "line busy" and fall back to text. Stale locks from dead processes are auto-cleaned.

VoiceLayer — Voice I/O for AI Coding Agents

Project journey

The Problem

5 Voice Modes

Local STT via whisper.cpp

Session Booking

Production Ready