Music Retrieval (Shazam Implementation)

This repository contains an implementation of the audio identification algorithm described in the Original Shazam Paper (Wang, 2003). The algorithm utilizes audio fingerprint hashing for efficient and accurate audio retrieval.

Features

Extracts audio fingerprints from sound samples
Matches audio queries against a database of known fingerprints
Uses a robust hashing technique for fast retrieval
Implements the core principles of the Shazam algorithm

Installation

To run this notebook, you can install the required dependencies using Conda with the provided environment.yml file:

conda env create -f environment.yml
conda activate ir

Usage

Load an audio file into the system.
The algorithm extracts unique fingerprints from the audio signal.
The fingerprints are then stored and used for matching against new queries.
The system identifies and returns the best-matching audio sample.

How It Works

Preprocessing: The input audio is transformed into a spectrogram.
Feature Extraction: Key frequency peaks are detected and used to create unique fingerprints.
Database Matching: The fingerprints are compared against a stored database of known audio fingerprints.
Result Output: If a match is found, the corresponding audio file is identified and retrieved.

Implementation Details

In this implementation, we focus on selecting anchor points and defining corresponding target zones from the peaks of the constellation map. We then compute the hashes and store them in a dictionary for efficient matching. To validate our approach, we experimented with four different configurations. We first conducted small-scale tests using the dataset from Milestone 2.1 and then extended our evaluation with an additional 60 tarballs (~34 thousand songs).

At the end of this notebook, you will find detailed visualizations, evaluations, and a comprehensive discussion of our thought process, results, and conclusions.

Dataset

The data audio tracks used in this notebook can be found at Freesound MTG-Jamendo. We used Tarballs 00-60 for our database creation, which comprise approximately 34,000 tracks. Once downloaded, the tracks should be placed in the tracks/ directory. The queries to test should be placed in the queries/ directory.

Query Generation

Original: The unmodified 10-second segment.
Noise: The segment with added Gaussian noise.
Coding: A strongly compressed version of the segment.
Mobile: A version recorded outdoors using a mobile phone.

Data Organization

Full Tracks: Stored in the tracks/ folder.
Generated Queries: Stored in the queries/ folder.

Note: All audio files are saved in MP3 format.

References

Original Shazam Paper (Wang, 2003)

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
queries		queries
MS2.1.html		MS2.1.html
MS2.1.ipynb		MS2.1.ipynb
MS2.2.ipynb		MS2.2.ipynb
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Music Retrieval (Shazam Implementation)

Features

Installation

Usage

How It Works

Implementation Details

Dataset

Query Generation

Data Organization

References

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Konstiu/Music-Information-Retrieval

Folders and files

Latest commit

History

Repository files navigation

Music Retrieval (Shazam Implementation)

Features

Installation

Usage

How It Works

Implementation Details

Dataset

Query Generation

Data Organization

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages