Voice AI ยท Realtime prototype

Jarvis Voice Sight

A mock-first realtime voice agent prototype with always-listening interaction, VAD, barge-in, turn isolation, sentence-level streaming TTS, and Ollama/vLLM runtime support.

Problem

Voice assistants often fail when conversation becomes interruptible, long-form, or latency-sensitive. The user needs to keep speaking without the system losing turn state.

System response

The prototype separates the frontend from an orchestrator and replaceable ASR, LLM, TTS, and emotion adapters, then measures whether users continue beyond the first exchange.

Evidence surface

  • Realtime mode, stale-audio discard, cancellable TTS queue, and sentence-level audio streaming.
  • Mock mode for microphone-unavailable contexts while preserving the full pipeline contract.
  • Typecheck, lint, test, benchmark, realtime smoke, health, preflight, and demo scripts.

Toolkit

TypeScriptVoice AIVADStreaming TTSOllamavLLM

Next validation layer

Use session memory and retrieval as the next design layer only after the continuous conversation loop remains stable.