This repository is for training DINOv2 for downstream tasks.
It is not for self-supervised learning.
- Data Pararell
- Class Balanced Loss
- Rare Class Sampling
- Select optimizer
- Freeze/Unfreeze backbone
Install the required packages using requirements.txt.
Because of xformer, it requires the latest version of pytorch. However you can use different version of xformer and pytorch.
pip install -r requirements.txtThe script requires a dataset formatted as below.
Data
├── ...
├── Class4
│ ├── Img1.png
│ ├── Img2.png
│ ├── ...
├── Class5
│ ├── Img1.png
│ ├── Img2.png
│ ├── ...
├── ...
Data preprocessing: Please run the following scripts to generate the class_stats.json.
python tools/preprocess.py /path/to/yout/datasetYou can launch the training code by using:
bash train.shYou can set your training arguments at config.py.
There is a setting for Rare Class Sampling(RCS). It is a setting for long-talied classification motivated from DAFormer.
This will sample the rare class more often during the iteration. However it has a risk of model to not see some classes.
It is more suitable for multi-label classifiaction.
Training arguments
batch_per_gpu(int): Number of samples per GPU in each forward step (default: 16).num_gpu(int): Number of GPUs used for training (default: 1).resize(tuple): The size to which input images are resized (default: (224, 224)).mean(list): Mean normalization values for each channel in RGB format (default: [0.485, 0.456, 0.406]).std(list): Standard deviation normalization values for each channel in RGB format (default: [0.229, 0.224, 0.225]).optimizer(dict): Optimizer settings.type: Optimizer type (default: 'SGD').params: Additional optimizer parameters, such as momentum (default: 0.9).learning_rate: Learning rates for different parts of the model.head_lr: Learning rate for the head (default: 1e-3).backbone_lr: Learning rate for the backbone (default: 1e-6).
scheduler(dict): Learning rate scheduler settings.type: Scheduler type (default: 'linear').params: Additional scheduler parameters like warmup ratio (default: 0.03).
do_eval(bool): Whether to perform evaluation during training (default: False).num_train_epoch(int): Number of epochs for training (default: 100).model(dict): Model architecture settings.backbone: Backbone model type (default: 'dinov2_l').head: Classification head type (default: 'single').num_classes: Number of output classes (default: 3).freeze_backbone: Whether to freeze the backbone during training (default: False).
loss(dict): Loss function settings.loss_type: Type of loss function (default: 'CE_loss').beta: Beta parameter for class-balanced loss (default: None).gamma: Gamma parameter for focal loss (default: None).
dataset(dict): Dataset paths.train: Training dataset settings.data_root: Root directory of the training dataset.
eval: Evaluation dataset settings.data_root: Root directory of the evaluation dataset.
max_checkpoint(int): Maximum number of checkpoints to keep (default: 1).
Note: The backbone learning rate is often set to be much smaller than the head learning rate to prevent overfitting the pretrained layers.
You can evaluate your model by using:
bash eval.shThe evaluation will calculate the top-k accuracy together.
- Multi-label classification
- Segmentation
This project is licensed under the Apache-2.0 License. See the LICENSE file for details.
If you find this repository useful in your project, please consider giving a ⭐ and citing:
@misc{Dino-v2-Finetuning,
author = {Yuwon Lee},
title = {Dino-V2-Finetune},
year = {2024},
publisher = {GitHub},
url = {https://github.com/2U1/DINOv2-Finetune}
}This project is based on