Skip to content

๐ŸŽง Transcribe any audio to text in seconds using OpenAI Whisper โ€” right in Google Colab. No setup needed! Upload your MP3, WAV, M4A, or FLAC file and get accurate, multilingual transcriptions powered by Whisperโ€™s medium model โ€” all free in the cloud. โ˜๏ธ

Notifications You must be signed in to change notification settings

asadsandhu/Whisper-Audio-To-Text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

6 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Whisper Audio-to-Text (Google Colab)

Python
Google Colab
OpenAI Whisper
License


๐Ÿš€ Overview

This repository provides a Google Colab-based implementation of OpenAIโ€™s Whisper AI for transcribing audio files into text.
Simply clone the repo, run the notebook, upload an audio file, and get an accurate transcription in seconds!

โœ… No setup required โ€“ Just open the .ipynb file in Colab
โœ… User-friendly โ€“ Upload an audio file, run a single command, and get transcriptions
โœ… Supports multiple languages with the Whisper medium model
โœ… Works with multiple audio formats including MP3, WAV, M4A, and FLAC


๐Ÿ“Œ Supported Audio Formats

Whisper AI supports a variety of common audio formats. You can upload files in any of the following formats:

Format File Extension Description
MP3 .mp3 Compressed audio format, widely used
WAV .wav High-quality, uncompressed audio
M4A .m4a Common format for iOS recordings
FLAC .flac Lossless audio format, high fidelity

๐Ÿ’ก Note: Files in other formats can be converted using FFmpeg before transcription.


๐Ÿ“– How to Use

1๏ธโƒฃ Clone This Repository in Google Colab

Open a new Colab notebook and run the following:

!git clone https://github.com/asadsandhu/whisper-audio-to-text.git
%cd whisper-audio-to-text

2๏ธโƒฃ Install Required Dependencies

Run the following commands one by one to set up Whisper AI and FFmpeg:

# Install Whisper AI
!pip install git+https://github.com/openai/whisper.git
# Install FFmpeg (required for audio processing)
!sudo apt update && sudo apt install ffmpeg

3๏ธโƒฃ Upload an Audio File (Optional - Skip if using Sample.mp3)

If you want to transcribe your own audio file, run this in a Colab cell:

from google.colab import files
uploaded = files.upload()

๐Ÿ”น After uploading your file, remember its name.
๐Ÿ”น In step 5, replace "Sample.mp3" with the name of your uploaded file.


4๏ธโƒฃ (Alternative) Use the Sample Audio File (Skip if you uploaded your own)

If you donโ€™t want to upload a file and just want to test Whisper AI, run this command:

!wget https://github.com/asadsandhu/whisper-audio-to-text/raw/main/Sample.mp3 -O Sample.mp3

๐Ÿ”น This will automatically download a sample MP3 file (Sample.mp3) from GitHub.
๐Ÿ”น If you already uploaded a file, skip this step.


5๏ธโƒฃ Run Whisper AI for Transcription

Now, transcribe the audio file with Whisper:

!whisper "Sample.mp3" --model medium

๐Ÿ”น If you uploaded your own file in step 3, replace "Sample.mp3" with your file name.
๐Ÿ”น You can change the model by replacing medium with any of the available Whisper models.

Available Whisper Models & Their Performance

Model Size Speed Accuracy Best For
tiny 39 MB ๐Ÿš€ Very Fast โŒ Low Accuracy Quick testing, low-end devices
base 74 MB โšก Fast ๐Ÿ”ธ Moderate Accuracy Short recordings, general use
small 244 MB โšก Moderate ๐Ÿ”น Good Accuracy Standard transcription tasks
medium 769 MB โณ Slower โœ… High Accuracy Most users, multilingual support
large 1550 MB ๐Ÿ•’ Slowest ๐Ÿ”ฅ Best Accuracy Research, high-quality needs

๐Ÿ”น Larger models provide better accuracy but take longer to process.
๐Ÿ”น If speed is a priority, use small or base. If accuracy is more important, use medium or large.
๐Ÿ”น Example: Using the large model

!whisper "Sample.mp3" --model large

๐Ÿ’ก Example Usage Flow

Scenario Steps to Follow
Upload your own audio file Run Step 3 โ†’ Skip Step 4 โ†’ Run Step 5 (rename file)
Use the sample audio file Skip Step 3 โ†’ Run Step 4 โ†’ Run Step 5

๐Ÿ”Š Sample File for Testing

A sample MP3 file (Sample.mp3) is included in this repository for quick testing. You can use it to check the functionality before uploading your own files.


๐Ÿ“Œ Requirements

  • Google Colab (Runs in the cloud, no local installation needed)
  • Python 3.7+
  • Whisper AI (pip install whisper)
  • FFmpeg (pre-installed in Colab)

๐ŸŽฏ Why Use This Repository?

โœ” Minimal Effort โ€“ Just upload and run, no extra configurations
โœ” Highly Accurate โ€“ Uses Whisperโ€™s medium model for precise transcriptions
โœ” Cloud-Based โ€“ Runs on Google Colabโ€™s free GPU resources
โœ” Multiple Formats โ€“ Supports MP3, WAV, M4A, FLAC


๐Ÿค Contributing

Contributions are welcome! If you find any issues or have ideas for improvements, feel free to submit a pull request or open an issue.


๐Ÿ“œ License

This project is licensed under the MIT License.


๐Ÿ”— Connect with Me

If you find this repository helpful, feel free to star ๐ŸŒŸ it on GitHub! ๐Ÿš€


About

๐ŸŽง Transcribe any audio to text in seconds using OpenAI Whisper โ€” right in Google Colab. No setup needed! Upload your MP3, WAV, M4A, or FLAC file and get accurate, multilingual transcriptions powered by Whisperโ€™s medium model โ€” all free in the cloud. โ˜๏ธ

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published