Robust Scene Change Detection Using Visual Foundation Models and Cross-Attention Mechanisms

Paper

An Introduction Video (3 minutes)

Installation

# clone main repo and corresponding submodule
$ git clone https://github.com/ChadLin9596/Robust-Scene-Change-Detection --recursive

# or
$ git clone https://github.com/ChadLin9596/Robust-Scene-Change-Detection
$ cd <this repository>
$ git submodule init
$ git submodule update

# create a Python 3.9.6 (or other env can run DinoV2) virtual environment
$ source <directory of virtual environment>/bin/activate
(env)$ cd <this repository>
(env)$ pip install -r requirements.txt

# install
(env)$ pip install -e thirdparties/py_utils
(env)$ pip install -e .

Datasets

download VL-CMU-CD & PSCD datasets
update the both dataset directories to Robust-Scene-Change-Detection/data_factory

Example usage

unittest

$ cd <this repository>/src/unittest
$ python -m unittest

loading a model and test (please check inference.ipynb)

import torch
import robust_scene_change_detect.models as models

B = 1
H = 504  # need to be 14 * n
W = 504  # need to be 14 * n

# load model
model = models.get_model_from_pretrained("dino_2Cross_CMU")
model = model.cuda().eval()
model.module.upsample.size = (H, W)

# load image
t0 = torch.rand(B, 3, H, W).cuda()
t1 = torch.rand(B, 3, H, W).cuda()

with torch.no_grad():
    pred = model(t0, t1)  # B, H, W, 2
    pred = pred.argmax(dim=-1)  # B, H, W

training

# modify the configuration in scripts/configs/train.yml
$ python <this repository>/src/scripts/train.py \
    <this repository>/src/scripts/configs/train.yml

fine-tune

# modify the configuration in scripts/configs/fine_tune.yml
$ python <this repository>/src/scripts/fine_tune.py \
    <this repository>/src/scripts/configs/fine_tune.yml

evaluation

$ python <this repository>/src/scripts/evaluate.py \
    <checkpoint directory>/<name>.pth

qualitive results

$ python <this repository>/scripts/visualize.py \
    <checkpoint directory>/best.val.pth \
    --option <option> \
    --output <directory for qualitive results>

options	comments
VL-CMU-CD	aligned
PSCD	aligned
VL-CMU-CD-diff_1	unaligned (adjacent distance == 1)
VL-CMU-CD-diff_-1	unaligned (adjacent distance == -1)
VL-CMU-CD-diff_2	unaligned (adjacent distance == 2)
VL-CMU-CD-diff_-2	unaligned (adjacent distance == -2)

Pretrained Weight

Train on VL-CMU-CD

name	train on VL-CMU-CD	train on diff VL-CMU-CD	fine-tune on PSCD
ours (DinoV2)	dinov2.2CrossAttn.CMU	dinov2.2CrossAttn.Diff-CMU	dinov2.2CrossAttn.PSCD
ours (Resnet-18)	resnet18.2CrossAttn.CMU	/	resnet18.2CrossAttn.PSCD
C-3PO	resnet18_id_4_deeplabv3_VL_CMU_CD	baseline.c3po.Diff-CMU	baseline.c3po.PSCD
DR-TANet	baseline.drtanet.CMU	baseline.drtanet.Diff-CMU	baseline.drtanet.PSCD
CDNet	baseline.cdnet.CMU	baseline.cdnet.Diff-CMU	/
TransCD	VL-CMU-CD -> Res-SViT_E1_D1_16.pth	/	/

backbone v.s. comparator

backbone	comparator	train on VL-CMU-CD
DinoV2	Co-Attention	dinov2.CoAttn.CMU
DinoV2	Temporal Attention	dinov2.TemporalAttn.CMU
DinoV2	MTF	dinov2.MTF.CMU
DinoV2	1 CrossAttn	dinov2.1CrossAttn.CMU
DinoV2	2 CrossAttn	dinov2.2CrossAttn.CMU
Resnet-18	2 CrossAttn	resnet18.2CrossAttn.CMU

Changelogs

v0.1.0

Modularize whole package to robust_scene_change_detect (can be installed by pip install -e)
Remove relative path setting for easier inference
Remove evaluation scripts for baselines.
Support torch hub loading to automatically download checkpoints

V0.0.0

Release datasets module
Release models module
Release train/fine-tune/evaluation/visualize scripts
Release pretraining weight
Examples of inference on new scenes

TODO

release source code
- release datasets module
- release models module
- release train/fine-tune/evaluation/visualize scripts
release pretraining weight
examples of inference on new scenes
support torch hub
~~support Hugging Face (Q3 2025 or earlier...)~~ (no longer needed)
refactor to master and keep baseline scripts into master-w-baselines branch (like C3PO)

BibTex

@inproceedings{lin2025robust,
  title        = {Robust scene change detection using visual foundation models and cross-attention mechanisms},
  author       = {Lin, Chun-Jung and Garg, Sourav and Chin, Tat-Jun and Dayoub, Feras},
  booktitle    = {2025 IEEE International Conference on Robotics and Automation (ICRA)},
  pages        = {8337--8343},
  year         = {2025},
  organization = {IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
examples		examples
scripts		scripts
src/robust_scene_change_detect		src/robust_scene_change_detect
test		test
thirdparties		thirdparties
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
data_factory		data_factory
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Robust Scene Change Detection Using Visual Foundation Models and Cross-Attention Mechanisms

Paper

An Introduction Video (3 minutes)

Installation

Datasets

Example usage

Pretrained Weight

Changelogs

v0.1.0

V0.0.0

TODO

BibTex

About

Uh oh!

Releases 2

Packages

Uh oh!

Languages

License

ChadLin9596/Robust-Scene-Change-Detection

Folders and files

Latest commit

History

Repository files navigation

Robust Scene Change Detection Using Visual Foundation Models and Cross-Attention Mechanisms

Paper

An Introduction Video (3 minutes)

Installation

Datasets

Example usage

Pretrained Weight

Changelogs

v0.1.0

V0.0.0

TODO

BibTex

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Languages

Packages