Add streaming transcription functionality and improve Python version handling #31

AlexanderMakarov · 2025-12-30T10:28:42Z

Summary

This PR introduces streaming transcription functionality to VOXD, enabling real-time incremental typing as you speak. Additionally, it includes improvements to Python version handling in installation scripts (inspired by PR #15).

🎙️ Streaming Transcription Feature

Overview

VOXD now supports streaming transcription by default, which means text appears incrementally as you speak, not after recording stops. This provides a more natural and responsive voice-typing experience.

Key Features

Real-time typing: Text appears word-by-word or phrase-by-phrase as it's transcribed (typically every 2 seconds or 3 words)
Chunk-based processing: Audio is processed in overlapping chunks (default: 3 seconds) for continuous transcription
Incremental updates: Text is typed incrementally during recording, making it feel like natural voice-typing
Seamless experience: You see your words appear in real-time, providing immediate feedback

How It Works

Press hotkey to start → VOXD begins recording and transcribing
As you speak → Text appears incrementally in your focused application
Press hotkey again → Finalizes any remaining transcription and copies to clipboard

Implementation Details

New Components:

StreamingWhisperTranscriber (src/voxd/core/streaming_transcriber.py): Processes audio in chunks and emits incremental text updates
StreamingCoreProcessThread (src/voxd/core/streaming_core.py): Orchestrates streaming recording, transcription, and typing for GUI/tray modes

Configuration Options:
streaming_enabled: true # Enable/disable streaming mode
streaming_chunk_seconds: 3.0 # Audio chunk size in seconds
streaming_overlap_seconds: 0.5 # Overlap between chunks
streaming_emit_interval_seconds: 2.0 # Minimum time between text updates
streaming_emit_word_count: 3 # Minimum words before emitting text
streaming_typing_delay: 0.01 # Delay between typed characters
streaming_min_chars_to_type: 3 # Minimum characters before typing

Modes Supported:

✅ CLI mode (voxd --rh)
✅ GUI mode (voxd --gui)
✅ Tray mode (voxd --tray)

Backward Compatibility:
Streaming is enabled by default but can be disabled via config to use the traditional "record-then-transcribe" behavior.

🐍 Python Version Improvements

This PR also includes improvements from PR #15 that remove hard-coded Python version checks:

Before: Only supported specific versions (3.9, 3.10, 3.11, 3.12, 3.13)
After: Uses >= 3.9 check, making it compatible with future Python versions automatically

Changes:

Updated packaging/voxd.wrapper to use version comparison (>= 3.9) instead of hard-coded version lists
Improved Python version detection logic to be more flexible and future-proof
Updated venv creation to use latest available Python version

Testing

Tested on:

✅ CLI mode with hotkey-controlled recording
✅ GUI mode with button-triggered recording
✅ Tray mode with hotkey-triggered recording
✅ Python version detection with various Python versions

Streaming transcription works as expected, providing real-time feedback during dictation. The Python version improvements ensure compatibility with future Python releases.

Benefits

Better UX: Users see their words appear in real-time, making voice-typing feel more natural
Immediate feedback: No need to wait until recording stops to see transcribed text
Future-proof: Python version handling supports upcoming Python versions automatically
Backward compatible: Can be disabled if users prefer the old behavior

I tested your PR on my Omarchy 3.2.x thinkpad (T490) and have some feedback.
I will point out that some of the issues I encountered are likely due to my initial install of voxd was done using the release package voxd-1.7.0-1-x86_64.pkg.tar.zst for Arch linux, and I tested your patch using the setup.sh.

v1.7.0 installs packaging/voxd.wrapper to /usr/bin/voxd .
The systemd service unit file packaging/voxd-tray.service has ExecStart= set to the voxd.wrapper script at /usr/bin/voxd. This should be updated to point to the user relative path.

If you do an actions build of the this in your fork I can test it.

AlexanderMakarov · 2025-12-31T05:31:20Z

Hi @mattsn0w,
Thank you for the feedback and testing it!

I've not tried to use voxd.wrapper and worked only with setup.sh. I have Linux Mint 21.3. I would try to fix mentioned issues anyway.

While in general idea of making streaming for voxd led me to necessity to speed-up whisper.cpp and now I am making migration to https://github.com/SYSTRAN/faster-whisper which promises 4x speed for same Whisper models. Streaming requires at least 2x speed of transcribing while I don't have (proper) GPU on my laptop. Faster-whisper is a different beast but it tuned for real-time transcribing, provides embedded Python API and offers word-level timestamps which are very handy. So I first would try to implement this migration in my https://github.com/AlexanderMakarov/voxd due to I don't have proper speech-to-text on my laptop yet.

…gly.

AlexanderMakarov · 2026-01-04T07:00:55Z

@mattsn0w I've implemented the fix. BTW it is not something coming with my changes but in general behavior of the repo - installation from the packet uses different paths than setup.sh.

And about my idea to switch on faster-whisper - I have found out that updating VOXD repo with it is not the best way and switched on simpler "Soupawhisper" repo (no UI, only notifications). Implemented streaming in my fork of it - https://github.com/AlexanderMakarov/soupawhisper

Note that with streaming quality of transcription drops significantly (with Whisper models).

fredericbirke and others added 4 commits November 1, 2025 22:45

Remove hard coded python versions and use > instead

8f3df0a

Add streaming input functionality

cb4a89d

Merge PR jakovius#15: Remove hard coded python versions and use >= in…

15f568e

…stead

Updated to use latest Python version to make venv.

a82c276

Add function to find voxd executable and update service unit accordin…

08b123e

…gly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add streaming transcription functionality and improve Python version handling #31

Add streaming transcription functionality and improve Python version handling #31

Uh oh!

AlexanderMakarov commented Dec 30, 2025

Uh oh!

mattsn0w commented Dec 31, 2025

Uh oh!

AlexanderMakarov commented Dec 31, 2025

Uh oh!

AlexanderMakarov commented Jan 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add streaming transcription functionality and improve Python version handling #31

Are you sure you want to change the base?

Add streaming transcription functionality and improve Python version handling #31

Uh oh!

Conversation

AlexanderMakarov commented Dec 30, 2025

Summary

🎙️ Streaming Transcription Feature

Overview

Key Features

How It Works

Implementation Details

🐍 Python Version Improvements

Testing

Benefits

Related

Uh oh!

mattsn0w commented Dec 31, 2025

Uh oh!

AlexanderMakarov commented Dec 31, 2025

Uh oh!

AlexanderMakarov commented Jan 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants