This repository provides a Google Colab-based implementation of OpenAIโs Whisper AI for transcribing audio files into text.
Simply clone the repo, run the notebook, upload an audio file, and get an accurate transcription in seconds!
โ
No setup required โ Just open the .ipynb file in Colab
โ
User-friendly โ Upload an audio file, run a single command, and get transcriptions
โ
Supports multiple languages with the Whisper medium model
โ
Works with multiple audio formats including MP3, WAV, M4A, and FLAC
Whisper AI supports a variety of common audio formats. You can upload files in any of the following formats:
| Format | File Extension | Description |
|---|---|---|
| MP3 | .mp3 |
Compressed audio format, widely used |
| WAV | .wav |
High-quality, uncompressed audio |
| M4A | .m4a |
Common format for iOS recordings |
| FLAC | .flac |
Lossless audio format, high fidelity |
๐ก Note: Files in other formats can be converted using FFmpeg before transcription.
Open a new Colab notebook and run the following:
!git clone https://github.com/asadsandhu/whisper-audio-to-text.git
%cd whisper-audio-to-textRun the following commands one by one to set up Whisper AI and FFmpeg:
# Install Whisper AI
!pip install git+https://github.com/openai/whisper.git# Install FFmpeg (required for audio processing)
!sudo apt update && sudo apt install ffmpegIf you want to transcribe your own audio file, run this in a Colab cell:
from google.colab import files
uploaded = files.upload()๐น After uploading your file, remember its name.
๐น In step 5, replace "Sample.mp3" with the name of your uploaded file.
If you donโt want to upload a file and just want to test Whisper AI, run this command:
!wget https://github.com/asadsandhu/whisper-audio-to-text/raw/main/Sample.mp3 -O Sample.mp3๐น This will automatically download a sample MP3 file (Sample.mp3) from GitHub.
๐น If you already uploaded a file, skip this step.
Now, transcribe the audio file with Whisper:
!whisper "Sample.mp3" --model medium๐น If you uploaded your own file in step 3, replace "Sample.mp3" with your file name.
๐น You can change the model by replacing medium with any of the available Whisper models.
| Model | Size | Speed | Accuracy | Best For |
|---|---|---|---|---|
tiny |
39 MB | ๐ Very Fast | โ Low Accuracy | Quick testing, low-end devices |
base |
74 MB | โก Fast | ๐ธ Moderate Accuracy | Short recordings, general use |
small |
244 MB | โก Moderate | ๐น Good Accuracy | Standard transcription tasks |
medium |
769 MB | โณ Slower | โ High Accuracy | Most users, multilingual support |
large |
1550 MB | ๐ Slowest | ๐ฅ Best Accuracy | Research, high-quality needs |
๐น Larger models provide better accuracy but take longer to process.
๐น If speed is a priority, use small or base. If accuracy is more important, use medium or large.
๐น Example: Using the large model
!whisper "Sample.mp3" --model large| Scenario | Steps to Follow |
|---|---|
| Upload your own audio file | Run Step 3 โ Skip Step 4 โ Run Step 5 (rename file) |
| Use the sample audio file | Skip Step 3 โ Run Step 4 โ Run Step 5 |
A sample MP3 file (Sample.mp3) is included in this repository for quick testing. You can use it to check the functionality before uploading your own files.
- Google Colab (Runs in the cloud, no local installation needed)
- Python 3.7+
- Whisper AI (
pip install whisper) - FFmpeg (pre-installed in Colab)
โ Minimal Effort โ Just upload and run, no extra configurations
โ Highly Accurate โ Uses Whisperโs medium model for precise transcriptions
โ Cloud-Based โ Runs on Google Colabโs free GPU resources
โ Multiple Formats โ Supports MP3, WAV, M4A, FLAC
Contributions are welcome! If you find any issues or have ideas for improvements, feel free to submit a pull request or open an issue.
This project is licensed under the MIT License.
If you find this repository helpful, feel free to star ๐ it on GitHub! ๐