This project focuses on the BirdCLEF 2025 competition, a machine learning challenge for bird song classification. The goal is to develop algorithms that can identify bird species from audio recordings, with applications in biodiversity monitoring and ecological research.
The dataset consists of several components:
-
train_audio/: Individual bird sound recordings
- Clean, single-species recordings
- Labeled with species information
- Format:
.oggfiles at 32 kHz - Filenames:
[collection][file_id_in_collection].ogg
-
train_soundscapes/: Full 1-minute environmental recordings
- Contains background noise and multiple species
- Similar format to test data
- Filenames:
[site]_[date]_[local_time].ogg
-
train.csv: Contains metadata for training recordings
primary_label: Species codesecondary_labels: Additional species in recordinglatitude&longitude: Recording locationauthor: Recording providerrating: Quality rating (1-5)collection: Source collection (XC, iNat, or CSA)
-
taxonomy.csv: Species information
- iNaturalist taxon ID
- Class name (Aves, Amphibia, Mammalia, Insecta)
Our preprocessing pipeline follows the BirdNET paper approach:
-
Spectrogram Generation
- Mel-spectrograms with 64 bands
- Frequency range: 150 Hz to 15 kHz
- FFT window size: ~32ms at 32kHz
- 25% overlap between frames
-
Data Augmentation
- Frequency shifts
- Time shifts
- Spectrogram warping
- Ambient noise addition
-
Signal Processing
- 3-second chunks
- Signal strength-based detection
- Log scaling for magnitude
create a uv environment
uv venv
then activate it
source .venv/bin/activate
install the necessary packages from the pyproject.toml
uv pip install -r pyproject.toml
and finally run the scripts
python preprocessing.py
python augmentation.py
python training.py
Licensed under a MIT License.