# clone main repo and corresponding submodule
$ git clone https://github.com/ChadLin9596/Robust-Scene-Change-Detection --recursive
# or
$ git clone https://github.com/ChadLin9596/Robust-Scene-Change-Detection
$ cd <this repository>
$ git submodule init
$ git submodule update
# create a Python 3.9.6 (or other env can run DinoV2) virtual environment
$ source <directory of virtual environment>/bin/activate
(env)$ cd <this repository>
(env)$ pip install -r requirements.txt
# install
(env)$ pip install -e thirdparties/py_utils
(env)$ pip install -e .
-
update the both dataset directories to
Robust-Scene-Change-Detection/data_factory
-
unittest
$ cd <this repository>/src/unittest $ python -m unittest
-
loading a model and test (please check inference.ipynb)
import torch import robust_scene_change_detect.models as models B = 1 H = 504 # need to be 14 * n W = 504 # need to be 14 * n # load model model = models.get_model_from_pretrained("dino_2Cross_CMU") model = model.cuda().eval() model.module.upsample.size = (H, W) # load image t0 = torch.rand(B, 3, H, W).cuda() t1 = torch.rand(B, 3, H, W).cuda() with torch.no_grad(): pred = model(t0, t1) # B, H, W, 2 pred = pred.argmax(dim=-1) # B, H, W
-
training
# modify the configuration in scripts/configs/train.yml $ python <this repository>/src/scripts/train.py \ <this repository>/src/scripts/configs/train.yml
-
fine-tune
# modify the configuration in scripts/configs/fine_tune.yml $ python <this repository>/src/scripts/fine_tune.py \ <this repository>/src/scripts/configs/fine_tune.yml
-
evaluation
$ python <this repository>/src/scripts/evaluate.py \ <checkpoint directory>/<name>.pth
-
qualitive results
$ python <this repository>/scripts/visualize.py \ <checkpoint directory>/best.val.pth \ --option <option> \ --output <directory for qualitive results>
options comments VL-CMU-CD aligned PSCD aligned VL-CMU-CD-diff_1 unaligned (adjacent distance == 1) VL-CMU-CD-diff_-1 unaligned (adjacent distance == -1) VL-CMU-CD-diff_2 unaligned (adjacent distance == 2) VL-CMU-CD-diff_-2 unaligned (adjacent distance == -2)
- Train on VL-CMU-CD
| name | train on VL-CMU-CD | train on diff VL-CMU-CD | fine-tune on PSCD |
|---|---|---|---|
| ours (DinoV2) | dinov2.2CrossAttn.CMU | dinov2.2CrossAttn.Diff-CMU | dinov2.2CrossAttn.PSCD |
| ours (Resnet-18) | resnet18.2CrossAttn.CMU | / | resnet18.2CrossAttn.PSCD |
| C-3PO | resnet18_id_4_deeplabv3_VL_CMU_CD | baseline.c3po.Diff-CMU | baseline.c3po.PSCD |
| DR-TANet | baseline.drtanet.CMU | baseline.drtanet.Diff-CMU | baseline.drtanet.PSCD |
| CDNet | baseline.cdnet.CMU | baseline.cdnet.Diff-CMU | / |
| TransCD | VL-CMU-CD -> Res-SViT_E1_D1_16.pth | / | / |
- backbone v.s. comparator
| backbone | comparator | train on VL-CMU-CD |
|---|---|---|
| DinoV2 | Co-Attention | dinov2.CoAttn.CMU |
| DinoV2 | Temporal Attention | dinov2.TemporalAttn.CMU |
| DinoV2 | MTF | dinov2.MTF.CMU |
| DinoV2 | 1 CrossAttn | dinov2.1CrossAttn.CMU |
| DinoV2 | 2 CrossAttn | dinov2.2CrossAttn.CMU |
| Resnet-18 | 2 CrossAttn | resnet18.2CrossAttn.CMU |
- Modularize whole package to
robust_scene_change_detect(can be installed bypip install -e) - Remove relative path setting for easier inference
- Remove evaluation scripts for baselines.
- Support torch hub loading to automatically download checkpoints
- Release datasets module
- Release models module
- Release train/fine-tune/evaluation/visualize scripts
- Release pretraining weight
- Examples of inference on new scenes
- release source code
- release datasets module
- release models module
- release train/fine-tune/evaluation/visualize scripts
- release pretraining weight
- examples of inference on new scenes
- support torch hub
-
support Hugging Face (Q3 2025 or earlier...)(no longer needed) - refactor to
masterand keep baseline scripts intomaster-w-baselinesbranch (like C3PO)
@inproceedings{lin2025robust,
title = {Robust scene change detection using visual foundation models and cross-attention mechanisms},
author = {Lin, Chun-Jung and Garg, Sourav and Chin, Tat-Jun and Dayoub, Feras},
booktitle = {2025 IEEE International Conference on Robotics and Automation (ICRA)},
pages = {8337--8343},
year = {2025},
organization = {IEEE}
}
