Skip to content

Open Video Transcribe - Open-source video transcription tool that emphasizes the primary use case: transcribing video files to text with support for multiple model types.

License

Notifications You must be signed in to change notification settings

alorbach/open-video-transcribe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Open Video Transcribe

Open-source video transcription tool that emphasizes the primary use case: transcribing video files to text with support for multiple model types.

Features

  • Video-First Design: Primary workflow is video β†’ audio β†’ transcription
  • FFmpeg Integration: Automatic video-to-audio conversion using user-provided FFmpeg
  • Multiple Model Support: Plugin-based system supporting Whisper and HuggingFace ASR models
  • GPU Acceleration: Automatic CUDA detection and support
  • Multiple Output Formats: Save transcriptions as TXT (with timestamps), SRT, or VTT
  • Progress Tracking: Real-time progress indicators for conversion and transcription
  • Drag and Drop: Drag video/audio files directly onto the application window
  • Test Mode: Transcribe only first 5 minutes for quick testing
  • Timestamps: TXT output includes timestamps at the beginning of each line
  • Auto-Setup: Automatic virtual environment creation and dependency installation

Requirements

  • Python 3.11 or 3.12
  • FFmpeg (auto-downloaded on Windows, user-provided on Linux/macOS)
  • NVIDIA GPU (optional, for CUDA acceleration)

Installation

Quick Start

  1. Run the setup script:

    # Windows
    setup.bat
    
    # Linux/macOS
    ./setup.sh

    Or manually:

    python install.py
  2. The installer will:

    • Detect installed Python versions (using Python Launcher)
    • Let you choose which Python version to use (auto-selects 3.11 or 3.12 if available)
    • Verify Python version is 3.11 or 3.12
    • Create a virtual environment
    • Install all dependencies
    • Detect GPU and install appropriate PyTorch version
    • Download FFmpeg automatically (Windows only)
    • Generate starter scripts

πŸ“– For detailed installation instructions, see Installation Guide

Manual Installation

  1. Create virtual environment:

    python -m venv venv
  2. Activate virtual environment:

    # Windows
    venv\Scripts\activate
    
    # Linux/macOS
    source venv/bin/activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Install PyTorch (with CUDA if GPU available):

    # GPU version (if NVIDIA GPU available)
    pip install torch --index-url https://download.pytorch.org/whl/cu128
    
    # CPU version
    pip install torch

Usage

Running the Application

# Windows
run.bat

# Linux/macOS
./run.sh

Or manually:

# Activate venv first
python main.py

Configuration

  1. FFmpeg Path: Set the path to your FFmpeg executable in Settings (auto-configured on Windows)
  2. Model Selection: Choose Whisper model, quantization, and device
  3. Language: Select input language (or auto-detect)
  4. Output Format: Choose TXT, SRT, or VTT format

Workflow

  1. Select a File:
    • Click "Select Video/Audio File" to choose a file, OR
    • Drag and drop a video/audio file directly onto the window
  2. Choose Mode:
    • Full File: Transcribe the entire file
    • Test Mode (5 min): Transcribe only first 5 minutes (for testing)
  3. The tool automatically:
    • Converts video to audio (if video file) - progress shown in real-time
    • Transcribes the audio - progress updated during processing
    • Saves the transcription in the selected format

Output Location: The transcription file is saved in the same directory as your input file, with the same basename. For example, my_video.mp4 β†’ my_video.txt.

Output Format: TXT files include timestamps at the beginning of each line (e.g., 0:35 Transcribed text), making it easy to navigate the transcription.

πŸ“– For complete usage instructions with screenshots, see User Guide

Project Structure

open-video-transcribe/
β”œβ”€β”€ main.py                 # Entry point
β”œβ”€β”€ install.py              # Auto venv creation & dependency installer
β”œβ”€β”€ requirements.txt        # Python dependencies
β”œβ”€β”€ config.yaml             # User configuration
β”œβ”€β”€ run.bat / run.sh        # Starter scripts
β”œβ”€β”€ setup.bat / setup.sh    # Setup scripts
β”‚
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ controller.py       # Main orchestrator
β”‚   β”œβ”€β”€ logging_config.py   # Logging setup
β”‚   β”œβ”€β”€ exceptions.py       # Custom exceptions
β”‚   β”‚
β”‚   β”œβ”€β”€ audio/
β”‚   β”‚   └── converter.py    # FFmpeg video-to-audio conversion
β”‚   β”‚
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   β”œβ”€β”€ base.py         # Abstract base class for models
β”‚   β”‚   β”œβ”€β”€ whisper_adapter.py    # Whisper model adapter
β”‚   β”‚   └── registry.py     # Model registry/discovery
β”‚   β”‚
β”‚   └── transcription/
β”‚       β”œβ”€β”€ service.py      # Transcription service
β”‚       └── progress.py     # Progress tracking
β”‚
β”œβ”€β”€ gui/
β”‚   β”œβ”€β”€ main_window.py      # Main GUI window
β”‚   β”œβ”€β”€ progress_dialog.py  # Progress indicator
β”‚   └── settings_dialog.py  # Settings/configuration UI
β”‚
└── config/
    └── manager.py          # Configuration management

Configuration

The config.yaml file stores user preferences:

ffmpeg_path: ""  # User-provided path
model:
  type: whisper
  name: large-v3
  quantization: float16
  device: cuda
languages:
  input: auto
  output: en
output:
  format: txt
  save_location: same_as_input

Supported Formats

Video Formats

  • MP4, AVI, MKV, WebM, MOV, FLV, WMV, M4V

Audio Formats

  • MP3, WAV, AAC, FLAC, M4A, OGG

Output Formats

  • TXT: Plain text
  • SRT: SubRip subtitle format
  • VTT: WebVTT subtitle format

Model Support

Currently supports:

  • Whisper models via faster-whisper
    • tiny, base, small, medium, large-v1, large-v2, large-v3
    • distil-small.en, distil-medium.en, distil-large-v2, distil-large-v3

Future support planned:

  • HuggingFace ASR models

Troubleshooting

FFmpeg Not Found

  • Windows: FFmpeg is downloaded automatically during installation. If missing, re-run setup.bat
  • Linux/macOS: Install via package manager (see Installation Guide)
  • Ensure FFmpeg path is set correctly in Settings
  • Download manually from: https://ffmpeg.org/download.html

Model Loading Fails

  • Check internet connection (models are downloaded from HuggingFace)
  • Ensure sufficient disk space
  • Try a smaller model if memory is limited

CUDA Errors

  • Verify NVIDIA drivers are installed
  • Check CUDA compatibility with PyTorch version
  • Fall back to CPU mode if GPU issues persist

License

Open source - see LICENSE file for details.

Python Version Selection

The setup.bat script (Windows) automatically detects installed Python versions and allows you to choose:

  • Auto-detection: Uses Python Launcher (py) to find all installed versions
  • Auto-selection: Automatically selects Python 3.12 or 3.11 if available
  • Manual selection: Menu to choose from available versions
  • Custom path: Option to specify a custom Python executable path
  • Validation: Verifies selected Python is version 3.11 or 3.12 before proceeding

Contributing

Contributions welcome! Please follow the existing code style and architecture patterns.

For AI agents working on this project, see AGENTS.md for detailed architecture and development guidelines.