A powerful, GPU-accelerated offline audio transcriber using OpenAI's Whisper models. Built with Python and Gradio.
- GPU Acceleration: Uses CUDA for blazing fast transcription (requires NVIDIA GPU).
- Offline Privacy: All processing happens locally on your machine.
- Multiple Models: Choose from Tiny (fastest) to Large (most accurate).
- Simple UI: Clean web interface powered by Gradio.
- Export: Save transcriptions directly to text files.
- Python 3.10 or higher
- NVIDIA GPU (Recommended for speed, but works on CPU)
- FFmpeg installed and added to system PATH.
-
Clone the repository:
git clone https://github.com/yourusername/VocalText.git cd VocalText -
Create a virtual environment:
conda create -n vocalText python=3.10 conda activate vocalText
-
Install dependencies:
# Install PyTorch with CUDA support (for NVIDIA GPUs) pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 # Install other requirements pip install -r requirements.txt
Option 1: Double-click start_app.bat (Windows only)
Option 2: Command Line
conda activate vocalText
python app.pyThe interface will open in your default web browser automatically.
Models are downloaded automatically on first use to your local cache.
| Model | Size | Speed | Accuracy |
|---|---|---|---|
| Tiny | 39 MB | Very Fast | Low |
| Base | 74 MB | Fast | Moderate |
| Small | 244 MB | Moderate | Good |
| Medium | 769 MB | Slow | High |
| Large | 1.5 GB | Slowest | Best |
- Select a model from the dropdown menu.
- Import an audio file.
- You can trim the audio file down in the viewport by clicking the scissor icon.
- Click the "Transcribe" button.
- If you haven't already dowloaded that model file it will likely take a while to download. Once it's downloaded it will transcribe the audio file.
- The transcription will be displayed in the viewport.
- You can copy the text from the viewport or save it to a text file.
MIT License


