A command-line tool and Python package for downloading audio from YouTube videos and transcribing them using OpenAI's Whisper model.
Note: This package uses yt-dlp for downloading YouTube videos, which is more reliable and actively maintained than pytube.
- Download audio from YouTube videos
- Transcribe audio files with Whisper
- Support for multiple Whisper models (tiny, base, small, medium, large, turbo)
- Language detection and specification
- Translation to English
- Multiple output formats (txt, json, srt, vtt)
- Flexible CLI: download only, transcribe only, or both
This tool requires ffmpeg to be installed on your system (needed for both Whisper and yt-dlp):
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg
# on Arch Linux
sudo pacman -S ffmpeg
# on MacOS using Homebrew
brew install ffmpeg
# on Windows using Chocolatey
choco install ffmpeg
# on Windows using Scoop
scoop install ffmpeg# Install from the current directory
pip install .
# Or install directly from GitHub
pip install git+https://github.com/Jiayou-Chao/transcribe_youtube.gitThe tool provides three main commands:
# Basic usage
transcribe-youtube download --url "https://www.youtube.com/watch?v=VIDEO_ID"
# Specify output path
transcribe-youtube download --url "https://www.youtube.com/watch?v=VIDEO_ID" --output "audio.mp3"# Basic usage (prints to stdout)
transcribe-youtube transcribe --file "audio.mp3"
# With model selection
transcribe-youtube transcribe --file "audio.mp3" --model medium
# Specify language (auto-detected if not specified)
transcribe-youtube transcribe --file "audio.mp3" --language Japanese
# Translate to English
transcribe-youtube transcribe --file "audio.mp3" --task translate
# Save to file with specific format
transcribe-youtube transcribe --file "audio.mp3" --output-dir "./transcripts" --output-format srt# Basic usage (prints to stdout)
transcribe-youtube run --url "https://www.youtube.com/watch?v=VIDEO_ID"
# With all options
transcribe-youtube run --url "https://www.youtube.com/watch?v=VIDEO_ID" \
--model medium \
--language Japanese \
--task translate \
--output-dir "./transcripts" \
--output-format json \
--keep-audio| Model | Size | Languages | Relative Speed |
|---|---|---|---|
| tiny | 39M | Multilingual | ~10x |
| tiny.en | 39M | English only | ~10x |
| base | 74M | Multilingual | ~7x |
| base.en | 74M | English only | ~7x |
| small | 244M | Multilingual | ~4x |
| small.en | 244M | English only | ~4x |
| medium | 769M | Multilingual | ~2x |
| medium.en | 769M | English only | ~2x |
| large | 1550M | Multilingual | 1x |
| turbo | 809M | Multilingual (default) | ~8x |
Note: The .en models for English-only applications tend to perform better. The turbo model is an optimized version of large-v3 that offers faster transcription speed with minimal accuracy degradation.
You can also use the package programmatically:
from transcribe_youtube import core
# Download audio
audio_path = core.download_audio("https://www.youtube.com/watch?v=VIDEO_ID")
# Transcribe audio
result = core.transcribe_audio(
audio_path,
model_name="medium",
language="en",
output_dir="./transcripts",
output_format="txt"
)
# Download and transcribe in one step
result, audio_path = core.run(
"https://www.youtube.com/watch?v=VIDEO_ID",
model_name="medium",
language="en",
task="transcribe",
output_dir="./transcripts",
output_format="txt",
keep_audio=True
)MIT License