This repository contains an implementation of the audio identification algorithm described in the Original Shazam Paper (Wang, 2003). The algorithm utilizes audio fingerprint hashing for efficient and accurate audio retrieval.
- Extracts audio fingerprints from sound samples
- Matches audio queries against a database of known fingerprints
- Uses a robust hashing technique for fast retrieval
- Implements the core principles of the Shazam algorithm
To run this notebook, you can install the required dependencies using Conda with the provided environment.yml file:
conda env create -f environment.yml
conda activate ir- Load an audio file into the system.
- The algorithm extracts unique fingerprints from the audio signal.
- The fingerprints are then stored and used for matching against new queries.
- The system identifies and returns the best-matching audio sample.
- Preprocessing: The input audio is transformed into a spectrogram.
- Feature Extraction: Key frequency peaks are detected and used to create unique fingerprints.
- Database Matching: The fingerprints are compared against a stored database of known audio fingerprints.
- Result Output: If a match is found, the corresponding audio file is identified and retrieved.
In this implementation, we focus on selecting anchor points and defining corresponding target zones from the peaks of the constellation map. We then compute the hashes and store them in a dictionary for efficient matching. To validate our approach, we experimented with four different configurations. We first conducted small-scale tests using the dataset from Milestone 2.1 and then extended our evaluation with an additional 60 tarballs (~34 thousand songs).
At the end of this notebook, you will find detailed visualizations, evaluations, and a comprehensive discussion of our thought process, results, and conclusions.
The data audio tracks used in this notebook can be found at Freesound MTG-Jamendo. We used Tarballs 00-60 for our database creation, which comprise approximately 34,000 tracks. Once downloaded, the tracks should be placed in the tracks/ directory. The queries to test should be placed in the queries/ directory.
- Original: The unmodified 10-second segment.
- Noise: The segment with added Gaussian noise.
- Coding: A strongly compressed version of the segment.
- Mobile: A version recorded outdoors using a mobile phone.
- Full Tracks: Stored in the
tracks/folder. - Generated Queries: Stored in the
queries/folder.
Note: All audio files are saved in MP3 format.
