Open-source video transcription tool that emphasizes the primary use case: transcribing video files to text with support for multiple model types.
- Video-First Design: Primary workflow is video β audio β transcription
- FFmpeg Integration: Automatic video-to-audio conversion using user-provided FFmpeg
- Multiple Model Support: Plugin-based system supporting Whisper and HuggingFace ASR models
- GPU Acceleration: Automatic CUDA detection and support
- Multiple Output Formats: Save transcriptions as TXT (with timestamps), SRT, or VTT
- Progress Tracking: Real-time progress indicators for conversion and transcription
- Drag and Drop: Drag video/audio files directly onto the application window
- Test Mode: Transcribe only first 5 minutes for quick testing
- Timestamps: TXT output includes timestamps at the beginning of each line
- Auto-Setup: Automatic virtual environment creation and dependency installation
- Python 3.11 or 3.12
- FFmpeg (auto-downloaded on Windows, user-provided on Linux/macOS)
- NVIDIA GPU (optional, for CUDA acceleration)
-
Run the setup script:
# Windows setup.bat # Linux/macOS ./setup.sh
Or manually:
python install.py
-
The installer will:
- Detect installed Python versions (using Python Launcher)
- Let you choose which Python version to use (auto-selects 3.11 or 3.12 if available)
- Verify Python version is 3.11 or 3.12
- Create a virtual environment
- Install all dependencies
- Detect GPU and install appropriate PyTorch version
- Download FFmpeg automatically (Windows only)
- Generate starter scripts
π For detailed installation instructions, see Installation Guide
-
Create virtual environment:
python -m venv venv
-
Activate virtual environment:
# Windows venv\Scripts\activate # Linux/macOS source venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
-
Install PyTorch (with CUDA if GPU available):
# GPU version (if NVIDIA GPU available) pip install torch --index-url https://download.pytorch.org/whl/cu128 # CPU version pip install torch
# Windows
run.bat
# Linux/macOS
./run.shOr manually:
# Activate venv first
python main.py- FFmpeg Path: Set the path to your FFmpeg executable in Settings (auto-configured on Windows)
- Model Selection: Choose Whisper model, quantization, and device
- Language: Select input language (or auto-detect)
- Output Format: Choose TXT, SRT, or VTT format
- Select a File:
- Click "Select Video/Audio File" to choose a file, OR
- Drag and drop a video/audio file directly onto the window
- Choose Mode:
- Full File: Transcribe the entire file
- Test Mode (5 min): Transcribe only first 5 minutes (for testing)
- The tool automatically:
- Converts video to audio (if video file) - progress shown in real-time
- Transcribes the audio - progress updated during processing
- Saves the transcription in the selected format
Output Location: The transcription file is saved in the same directory as your input file, with the same basename. For example, my_video.mp4 β my_video.txt.
Output Format: TXT files include timestamps at the beginning of each line (e.g., 0:35 Transcribed text), making it easy to navigate the transcription.
π For complete usage instructions with screenshots, see User Guide
open-video-transcribe/
βββ main.py # Entry point
βββ install.py # Auto venv creation & dependency installer
βββ requirements.txt # Python dependencies
βββ config.yaml # User configuration
βββ run.bat / run.sh # Starter scripts
βββ setup.bat / setup.sh # Setup scripts
β
βββ core/
β βββ controller.py # Main orchestrator
β βββ logging_config.py # Logging setup
β βββ exceptions.py # Custom exceptions
β β
β βββ audio/
β β βββ converter.py # FFmpeg video-to-audio conversion
β β
β βββ models/
β β βββ base.py # Abstract base class for models
β β βββ whisper_adapter.py # Whisper model adapter
β β βββ registry.py # Model registry/discovery
β β
β βββ transcription/
β βββ service.py # Transcription service
β βββ progress.py # Progress tracking
β
βββ gui/
β βββ main_window.py # Main GUI window
β βββ progress_dialog.py # Progress indicator
β βββ settings_dialog.py # Settings/configuration UI
β
βββ config/
βββ manager.py # Configuration management
The config.yaml file stores user preferences:
ffmpeg_path: "" # User-provided path
model:
type: whisper
name: large-v3
quantization: float16
device: cuda
languages:
input: auto
output: en
output:
format: txt
save_location: same_as_input- MP4, AVI, MKV, WebM, MOV, FLV, WMV, M4V
- MP3, WAV, AAC, FLAC, M4A, OGG
- TXT: Plain text
- SRT: SubRip subtitle format
- VTT: WebVTT subtitle format
Currently supports:
- Whisper models via faster-whisper
- tiny, base, small, medium, large-v1, large-v2, large-v3
- distil-small.en, distil-medium.en, distil-large-v2, distil-large-v3
Future support planned:
- HuggingFace ASR models
- Windows: FFmpeg is downloaded automatically during installation. If missing, re-run
setup.bat - Linux/macOS: Install via package manager (see Installation Guide)
- Ensure FFmpeg path is set correctly in Settings
- Download manually from: https://ffmpeg.org/download.html
- Check internet connection (models are downloaded from HuggingFace)
- Ensure sufficient disk space
- Try a smaller model if memory is limited
- Verify NVIDIA drivers are installed
- Check CUDA compatibility with PyTorch version
- Fall back to CPU mode if GPU issues persist
Open source - see LICENSE file for details.
The setup.bat script (Windows) automatically detects installed Python versions and allows you to choose:
- Auto-detection: Uses Python Launcher (
py) to find all installed versions - Auto-selection: Automatically selects Python 3.12 or 3.11 if available
- Manual selection: Menu to choose from available versions
- Custom path: Option to specify a custom Python executable path
- Validation: Verifies selected Python is version 3.11 or 3.12 before proceeding
Contributions welcome! Please follow the existing code style and architecture patterns.
For AI agents working on this project, see AGENTS.md for detailed architecture and development guidelines.