Skip to content

voidstarr/minimal-voice-assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ™οΈ Real-time Voice Assistant

A minimal, real-time voice assistant powered by WebRTC streaming, featuring ultra-low latency audio processing with professional-grade components.

✨ Features

  • 🌊 WebRTC Streaming: Ultra-low latency audio streaming via FastRTC
  • 🎯 Advanced VAD: Silero voice activity detection with configurable parameters
  • 🎀 Speech Recognition: Whisper ASR for accurate speech-to-text transcription
  • πŸ”Š Neural TTS: High-quality Kokoro text-to-speech with natural-sounding voices
  • πŸ€– AI Integration: Support for both Ollama (local) and OpenRouter GPT-5 Nano
  • πŸ’¬ Context-Aware: Maintains conversation history for coherent responses

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    WebRTC     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Browser   │◄──────────────►│   FastRTC        β”‚
β”‚  (Client)   β”‚   Ultra-low    β”‚   Stream         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   latency      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                        β”‚
                                        β–Ό
                                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                β”‚  Silero VAD      β”‚
                                β”‚  (Voice Activity)β”‚
                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                        β”‚
                                        β–Ό
                                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                β”‚  Whisper ASR     β”‚
                                β”‚  (Speech-to-Text)β”‚
                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                        β”‚
                                        β–Ό
                                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                β”‚  LLM Processing  β”‚
                                β”‚  (Ollama/OpenAI) β”‚
                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                        β”‚
                                        β–Ό
                                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                β”‚  Kokoro TTS      β”‚
                                β”‚  (Text-to-Speech)β”‚
                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                        β”‚
                                        β–Ό
                                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                β”‚  Audio Stream    β”‚
                                β”‚  (Back to client)β”‚
                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“‹ Requirements

  • Python: 3.12 or higher
  • uv: Modern Python package manager
  • System Dependencies:
    • Build tools (gcc/g++)
    • Python development headers
    • espeak-ng for phonemization
  • AI Backend: Either:
    • Ollama with gemma3:4b model (local), OR
    • OpenRouter API key for GPT-5 Nano (cloud)

πŸš€ Quick Setup

1. Clone the Repository

git clone https://github.com/voidstarr/minimal-voice-assistant.git
cd minimal-voice-assistant

2. Run Setup Script

The setup script will automatically:

  • Install uv if not present
  • Install system dependencies (gcc, python3-dev, espeak)
  • Create a virtual environment
  • Install all Python dependencies
  • Generate SSL certificates for HTTPS
chmod +x setup.sh
./setup.sh

3. Configure LLM Backend

Choose ONE of the following options:

Option A: Local Ollama (Recommended for Privacy)

  1. Install Ollama from ollama.ai
  2. Pull the required model:
    ollama pull gemma3:4b

Option B: OpenRouter Cloud API

  1. Create a .env file:
    echo "OPENROUTER_API_KEY=your_api_key_here" > .env
  2. Get your API key from OpenRouter

4. Run the Assistant

chmod +x run.sh
./run.sh

The assistant will start on https://localhost:7860

🎯 Usage

  1. Open the Interface: Navigate to https://localhost:7860 in your browser

    • Accept the self-signed certificate warning (this is expected for local development)
  2. Grant Microphone Access: Allow browser microphone permissions when prompted

  3. Start Speaking: The assistant will automatically detect when you start and stop speaking using advanced VAD

  4. Receive Responses: The AI will process your speech and respond with natural voice

βš™οΈ Configuration

Voice Activity Detection (VAD)

Edit voice_assistant.py to adjust VAD parameters:

self.vad_options = SileroVadOptions(
    threshold=0.5,                    # Speech detection sensitivity
    min_speech_duration_ms=250,       # Minimum speech length
    max_speech_duration_s=30.0,       # Maximum speech length
    min_silence_duration_ms=500,      # Silence before processing
    window_size_samples=1024,         # VAD processing window
    speech_pad_ms=200                 # Padding around speech
)

TTS Voice

Change the voice in the generate_tts method:

samples, sample_rate = self.kokoro.create(
    text, voice="af_heart", speed=1.0, lang="en-us"
)

Available voices depend on your Kokoro model configuration.

πŸ“ Project Structure

minimal-voice-assistant/
β”œβ”€β”€ voice_assistant.py      # Main application
β”œβ”€β”€ pyproject.toml          # Project dependencies
β”œβ”€β”€ setup.sh                # Setup script
β”œβ”€β”€ run.sh                  # Run script
β”œβ”€β”€ models/                 # TTS models
β”‚   └── kokoro-v1.0.onnx   # Kokoro TTS model
β”œβ”€β”€ ssl_certs/              # SSL certificates
β”‚   β”œβ”€β”€ cert.pem
β”‚   └── key.pem
└── README.md               # This file

πŸ”§ Troubleshooting

SSL Certificate Warnings

Issue: Browser shows security warning

Solution: This is expected for self-signed certificates. Click "Advanced" and "Proceed" to continue. For production, use proper SSL certificates.

Microphone Not Working

Issue: No audio detected

Solution:

  • Ensure HTTPS is enabled (required for WebRTC)
  • Grant microphone permissions in browser
  • Check browser console for errors
  • Verify microphone works in other applications

Ollama Connection Failed

Issue: "Ollama connection failed" error

Solution:

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Start Ollama if needed
ollama serve

# Pull the model
ollama pull gemma3:4b

espeak Not Found

Issue: Phonemizer or espeak errors

Solution: Reinstall system dependencies:

# Ubuntu/Debian
sudo apt-get install espeak espeak-data libespeak-dev

# Fedora/RHEL
sudo dnf install espeak espeak-devel

Model Loading Errors

Issue: TTS or ASR models fail to load

Solution:

  • Ensure models/kokoro-v1.0.onnx is present
  • Check sufficient RAM (4GB+ recommended)
  • Verify Python 3.12+ is installed

πŸ› οΈ Development

Manual Installation

If you prefer not to use the setup script:

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create virtual environment
uv venv

# Activate environment
source .venv/bin/activate  # Linux/Mac
# or
.venv\Scripts\activate     # Windows

# Install dependencies
uv pip install -e .

Running Without SSL

For development without HTTPS (note: WebRTC may not work):

# In voice_assistant.py, modify the launch call:
interface.launch(
    server_name="0.0.0.0",
    server_port=7860,
    share=False
)

πŸ“Š Performance

  • Latency: ~200-500ms end-to-end (depends on LLM)
  • CPU Usage: Moderate (optimized for CPU-only operation)
  • RAM Usage: ~2-4GB (with models loaded)
  • Network: Minimal (except for OpenRouter API calls)

🀝 Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

πŸ™ Acknowledgments

πŸ”— Links

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published