Recall.ai - Meeting Transcription API
If you’re looking for a transcription API for meetings, consider checking out Recall.ai , an API that works with Zoom, Google Meet, Microsoft Teams, and more. Recall.ai diarizes by pulling the speaker data and separate audio streams from the meeting platforms, which means 100% accurate speaker diarization with actual speaker names.
A Powerful Open Source Video Translation / Audio Transcription / AI Dubbing / Subtitle Translation Tool
pyVideoTrans is dedicated to seamlessly converting videos from one language to another, offering a complete workflow that includes speech recognition, subtitle translation, multi-role dubbing, and audio-video synchronization. It supports both local offline deployment and a wide variety of mainstream online APIs.
- 🎥 Fully Automatic Video Translation: One-click workflow: Speech Recognition (ASR) -> Subtitle Translation -> Speech Synthesis (TTS) -> Video Synthesis.
- 🎙️ Audio Transcription / Subtitle Generation: Batch convert audio/video to SRT subtitles, supporting Speaker Diarization to distinguish between different roles.
- 🗣️ Multi-Role AI Dubbing: Assign different AI dubbing voices to different speakers.
- 🧬 Voice Cloning: Integrates models like F5-TTS, CosyVoice, GPT-SoVITS for zero-shot voice cloning.
- 🧠 Powerful Model Support:
- ASR: Faster-Whisper (Local), OpenAI Whisper, Alibaba Qwen, ByteDance Volcano, Azure, Google, etc.
- LLM Translation: DeepSeek, ChatGPT, Claude, Gemini, Ollama (Local), Alibaba Bailian, etc.
- TTS: Edge-TTS (Free), OpenAI, Azure, Minimaxi, ChatTTS, ChatterBox, etc.
- 🖥️ Interactive Editing: Supports pausing and manual proofreading at each stage (recognition, translation, dubbing) to ensure accuracy.
- 🛠️ Utility Toolkit: Includes auxiliary tools such as vocal separation, video/subtitle merging, audio-video alignment, and transcript matching.
- 💻 Command Line Interface (CLI): Supports headless operation, convenient for server deployment or batch processing.
We provide a pre-packaged .exe version for Windows 10/11 users, requiring no Python environment configuration.
- Download: Click to download the latest pre-packaged version
- Unzip: Extract the compressed file to a path (e.g.,
D:\pyVideoTrans). - Run: Double-click
sp.exeinside the folder to launch.
Note:
- Do not run directly from within the compressed archive.
- To use GPU acceleration, ensure CUDA 12.8 and cuDNN 9.11 are installed.
We recommend using uv for package management for faster speed and better environment isolation.
- Python: Recommended version 3.10 --> 3.12
- FFmpeg: Must be installed and configured in the environment variables.
- macOS:
brew install ffmpeg libsndfile git - Linux (Ubuntu/Debian):
sudo apt-get install ffmpeg libsndfile1-dev - Windows: Download FFmpeg and configure Path, or place
ffmpeg.exeandffprobe.exedirectly in the project directory.
- macOS:
# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows (PowerShell)
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"# 1. Clone the repository (Ensure path has no spaces/Chinese characters)
git clone https://github.com/jianchang512/pyvideotrans.git
cd pyvideotrans
# 2. Install dependencies (uv automatically syncs environment)
uv syncLaunch GUI:
uv run sp.pyUse CLI:
# Video Translation Example
uv run cli.py --task vtv --name "./video.mp4" --source_language_code zh --target_language_code en
# Audio to Subtitle Example
uv run cli.py --task stt --name "./audio.wav" --model_name large-v3If you have an NVIDIA graphics card, execute the following commands to install the CUDA-supported PyTorch version:
# Uninstall CPU version
uv remove torch torchaudio
# Install CUDA version (Example for CUDA 12.x)
uv add torch==2.7 torchaudio==2.7 --index-url https://download.pytorch.org/whl/cu128
uv add nvidia-cublas-cu12 nvidia-cudnn-cu12| Category | Channel/Model | Description |
|---|---|---|
| ASR (Speech Recognition) | Faster-Whisper (Local) | Recommended, fast speed, high accuracy |
| WhisperX / Parakeet | Supports timestamp alignment & speaker diarization | |
| Alibaba Qwen3-ASR / ByteDance Volcano | Online API, excellent for Chinese | |
| Translation (LLM/MT) | DeepSeek / ChatGPT | Supports context understanding, more natural translation |
| Google / Microsoft | Traditional machine translation, fast speed | |
| Ollama / M2M100 | Fully local offline translation | |
| TTS (Speech Synthesis) | Edge-TTS | Microsoft free interface, natural effect |
| F5-TTS / CosyVoice | Supports Voice Cloning, requires local deployment | |
| GPT-SoVITS / ChatTTS | High-quality open-source TTS | |
| 302.AI / OpenAI / Azure | High-quality commercial API |
- Official Documentation: https://pyvideotrans.com (Includes detailed tutorials, API configuration guides, FAQ)
- Online Q&A Community: https://bbs.pyvideotrans.com (Submit error logs for automated AI analysis and answers)
This software is an open-source, free, non-commercial project. Users are solely responsible for any legal consequences arising from the use of this software (including but not limited to calling third-party APIs or processing copyrighted video content). Please comply with local laws and regulations and the terms of use of relevant service providers.
This project mainly relies on the following open-source projects (partial):
Created by jianchang512