Skip to content

thanosDelatolas/diff-zvos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Studying Image Diffusion Features for Zero-Shot Video Object Segmentation

CVPRW 2025 – IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops

Thanos Delatolas · Vicky Kalogeiton · Dim Papadopoulos

Webpage · Paper

teaser

We leverage pre-trained diffusion models for Zero-Shot Video Object Segmentation by addressing key challenges:
  • selecting the appropriate diffusion model
  • determining the optimal time step
  • identifying the best feature extraction layer
  • designing an effective affinity matrix calculation strategy to match the features

Installation

conda create -n diff-zvos python=3.10.8
conda activate diff-zvos
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt
sh scripts/install_adm.sh

Datasets

To download the datasets, run:

python scripts/download_datasets.py

To run inference please follow EVALUATION.md.

Citation

@article{delatolas2025studying,
  title={Studying Image Diffusion Features for Zero-Shot Video Object Segmentation},
  author={Delatolas, Thanos and Kalogeiton, Vicky and Papadopoulos, Dim P},
  journal={arXiv preprint arXiv:2504.05468},
  year={2025}
}

State-of-the-art Comparison in Zero-Shot Video Segmentation

Model #Images #Segmentations (Image) #Frames #Segmentations (Video) Datasets DAVIS-17 val
Image + Video-level Data
XMem 1.02M 27K 150K 210K I+S+D+Y 86.2
Cutie 1.02M 27K 150K 210K I+S+D+Y 88.8
SAM2 11M 1.1B 4.2M 35.5M SA+SAV 90.7
Image-Level masks
SegIC 1.3M 1.8M I+C+A+L 73.7
SegGPT 147K 1.62M C+A+V 75.6
PerSAM-F 11M 1.1B SA 76.1
Matcher 11M 1.1B SA 79.5
No masks
FGVG 1M 116K I+Y+FT 72.4
STT 1M 95K I+Y 74.1
STC 20M K 67.6
INO 20M K 72.5
Mask-VOS 95K Y 75.6
MoCo 1M I 65.4
SHLS 10K M 68.5
DIFT-SD 5B LN 70.0
DINO 1M I 71.4
DIFT-ADM 1M I 75.7
Training-Free-VOS 1M I 76.3
Ours
SD-2.1 + Prompt Learning 5B LN 70.5
ADM + MAGFilter 1M I 76.8

Acknowledgements

We would like to thank the authors of DIFT, DINO and Cutie for making their code publicly available.

About

[CVPRW2025] Studying Image Diffusion Features for Zero-Shot Video Object Segmentation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published