Skip to content

Cross-platform desktop application for offline audio transcription. Features a modern React UI, real-time model management, and local processing using OpenAI's Whisper models via ONNX Runtime.

Notifications You must be signed in to change notification settings

ArtisticMusician/VocalText

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VocalText 🎙️

A powerful, GPU-accelerated offline audio transcriber using OpenAI's Whisper models. Built with Python and Gradio.

Features

  • GPU Acceleration: Uses CUDA for blazing fast transcription (requires NVIDIA GPU).
  • Offline Privacy: All processing happens locally on your machine.
  • Multiple Models: Choose from Tiny (fastest) to Large (most accurate).
  • Simple UI: Clean web interface powered by Gradio.
  • Export: Save transcriptions directly to text files.

Requirements

  • Python 3.10 or higher
  • NVIDIA GPU (Recommended for speed, but works on CPU)
  • FFmpeg installed and added to system PATH.

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/VocalText.git
    cd VocalText
  2. Create a virtual environment:

    conda create -n vocalText python=3.10
    conda activate vocalText
  3. Install dependencies:

    # Install PyTorch with CUDA support (for NVIDIA GPUs)
    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
    
    # Install other requirements
    pip install -r requirements.txt

Usage

Option 1: Double-click start_app.bat (Windows only)

Option 2: Command Line

conda activate vocalText
python app.py

The interface will open in your default web browser automatically.

Models

Models are downloaded automatically on first use to your local cache.

Model Size Speed Accuracy
Tiny 39 MB Very Fast Low
Base 74 MB Fast Moderate
Small 244 MB Moderate Good
Medium 769 MB Slow High
Large 1.5 GB Slowest Best

Order of Operations

  1. Select a model from the dropdown menu.
  2. Import an audio file.
    • You can trim the audio file down in the viewport by clicking the scissor icon.
  3. Click the "Transcribe" button.
    • If you haven't already dowloaded that model file it will likely take a while to download. Once it's downloaded it will transcribe the audio file.
  4. The transcription will be displayed in the viewport.
  5. You can copy the text from the viewport or save it to a text file.

Audio Features Trim Audio

License

MIT License

About

Cross-platform desktop application for offline audio transcription. Features a modern React UI, real-time model management, and local processing using OpenAI's Whisper models via ONNX Runtime.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published