Add streaming transcription functionality and improve Python version handling #31
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces streaming transcription functionality to VOXD, enabling real-time incremental typing as you speak. Additionally, it includes improvements to Python version handling in installation scripts (inspired by PR #15).
🎙️ Streaming Transcription Feature
Overview
VOXD now supports streaming transcription by default, which means text appears incrementally as you speak, not after recording stops. This provides a more natural and responsive voice-typing experience.
Key Features
How It Works
Implementation Details
New Components:
StreamingWhisperTranscriber(src/voxd/core/streaming_transcriber.py): Processes audio in chunks and emits incremental text updatesStreamingCoreProcessThread(src/voxd/core/streaming_core.py): Orchestrates streaming recording, transcription, and typing for GUI/tray modesConfiguration Options:
streaming_enabled: true # Enable/disable streaming mode
streaming_chunk_seconds: 3.0 # Audio chunk size in seconds
streaming_overlap_seconds: 0.5 # Overlap between chunks
streaming_emit_interval_seconds: 2.0 # Minimum time between text updates
streaming_emit_word_count: 3 # Minimum words before emitting text
streaming_typing_delay: 0.01 # Delay between typed characters
streaming_min_chars_to_type: 3 # Minimum characters before typing
Modes Supported:
voxd --rh)voxd --gui)voxd --tray)Backward Compatibility:
Streaming is enabled by default but can be disabled via config to use the traditional "record-then-transcribe" behavior.
🐍 Python Version Improvements
This PR also includes improvements from PR #15 that remove hard-coded Python version checks:
>= 3.9check, making it compatible with future Python versions automaticallyChanges:
packaging/voxd.wrapperto use version comparison (>= 3.9) instead of hard-coded version listsTesting
Tested on:
Streaming transcription works as expected, providing real-time feedback during dictation. The Python version improvements ensure compatibility with future Python releases.
Benefits
Related