This repository contains my experiments and solutions for the BirdCLEF 2025 Kaggle Competition. The goal is to identify bird species from long soundscape recordings using machine learning and audio signal processing techniques.
- Custom PyTorch Model: Based on
seresnext26t_32x4dbackbone using the TIMM library. - Voice Activity Detection (VAD): Filters out non-bird audio segments to focus on relevant parts.
- Audio Preprocessing:
- 10-second chunking of soundscape recordings for efficient inference.
- Log-mel spectrograms computed and stored as
.npzbatches.
- Self-Supervised Learning: Pseudo-labeling applied using confident predictions from unlabeled soundscapes.
- Efficient Data Handling: Custom
DataLoaderandDatasetwith optional augmentations such as time/frequency masking and noise injection. - Clean Label Management: Generated pseudo-labels saved as
pseudo_labels_soundscape.csvfor downstream training.
The dataset consists of long-form audio recordings of bird soundscapes, provided by the BirdCLEF 2025 Kaggle Competition. Audio files are processed into 10-second chunks, and mel spectrogram features are extracted for model input.
- Clone the repository:
git clone https://github.com/SheemaMasood381/BirdCLEF-2025.git
cd BirdCLEF-2025- Install the required Python packages:
pip install -r requirements.txtDependencies include: PyTorch, torchaudio, timm, librosa, numpy, pandas, scikit-learn
-
Preprocess the audio recordings (chunking & spectrograms).
-
Train the model:
python train.py --config configs/train_config.yaml- Inference on new soundscape data:
python inference.py --audio_path path/to/audio- Pseudo-labeling (optional for self-supervised learning):
python pseudo_labeling.py --unlabeled_data path/to/unlabeled- Custom
seresnext26t_32x4dmodel achieved strong performance on BirdCLEF validation data. - Pseudo-labeling improved accuracy by leveraging unlabeled soundscape recordings.
(Specific leaderboard scores can be included if permitted.)
- Experiment with other transformer-based architectures for audio classification.
- Implement ensemble models combining multiple backbones for higher accuracy.
- Explore advanced data augmentation techniques to improve generalization.
- BirdCLEF 2025 Kaggle Competition
- PyTorch & TIMM library
- Open-source audio processing tools:
librosa,torchaudio
This project is licensed under the MIT License – see the LICENSE file for details.
cd BirdCLEF-2025