Loading...
What VoiceLayer — Voice I/O for AI Coding Agents can do
From fire-and-forget to full conversation — automatically
voice_speak for text-to-speech (announcements, briefings, status updates). voice_ask for bidirectional Q&A with session booking. Auto-mode detection chooses the right interaction pattern based on context — no manual mode switching.
~300ms transcription, no cloud required
Recording uses sox at 16kHz mono PCM, processed in 1-second chunks with RMS energy detection. Transcription runs through whisper.cpp locally — ~200-400ms on Apple Silicon with ggml-large-v3-turbo. A cloud fallback via Wispr Flow WebSocket API handles cases where local setup isn't available. Backend selection is automatic based on what's installed.
{
"mcpServers": {
"qa-voice": {
"command": "bunx",
"args": ["voicelayer-mcp"],
"env": {
"QA_VOICE_STT_BACKEND": "auto",
"QA_VOICE_TTS_VOICE": "en-US-JennyNeural"
}
}
}
}MCP config with STT backend auto-detection
One microphone, no conflicts
Only one Claude session can use the microphone at a time. A lockfile at /tmp/voicelayer-session.lock stores the owning PID, session ID, and start timestamp. Lock creation uses atomic wx write flags to prevent race conditions. Dead process detection uses signal-zero — if the owning PID no longer exists, the stale lock is automatically cleaned up.
// Other sessions see:
{
isError: true,
content: [{
type: "text",
text: "Line is busy — session abc123 " +
"(PID 4821) since 14:30:00. " +
"Fall back to text input."
}]
}"Line busy" response with owner details
Free, high-quality speech with word-boundary splitting
Microsoft Edge-TTS provides neural-quality speech synthesis at zero cost. Long messages are automatically chunked at word boundaries to prevent truncation. Speech rate auto-adjusts based on content length — shorter messages play faster, longer explanations slow down by up to 15%. Each voice mode has its own rate default.
Singleton voice service via socat — always on
VoiceLayer runs as a macOS MCP daemon with dual-protocol support (NDJSON + MCP Content-Length). A socat-based singleton ensures only one voice service instance runs, even across multiple Claude sessions. Auto-starts via macOS LaunchAgent. User-controlled stop via signal file, with 5-minute orphan timeout for session booking cleanup.