Skip to content

Official Repo for “dVLM-AD: Enhance Diffusion Vision-Language-Model for Driving via Controllable Reasoning”

Notifications You must be signed in to change notification settings

SaFo-Lab/dVLM-AD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚗 dVLM-AD

Diffusion-based Vision-Language Models for Autonomous Driving

Project Page arXiv License

dVLM-AD formulates autonomous driving decision-making as a conditional diffusion process over actions, enabling bidirectional context reasoning, improved robustness to uncertainty, and stronger reasoning–action consistency compared to autoregressive vision-language models.

For motivation, qualitative examples, and benchmark evaluations, please refer to the project website:
👉 https://dvlm-ad.github.io/


Environment Setup

We recommend using conda to manage the environment.

Create and activate environment

conda create -n dvlm python=3.10 -y
bash init_env.sh

Running Inference

Prepare model checkpoint

Download the checkpoint and place it under:

checkpoints/

Checkpoint download links will be provided on the project website.


Run inference

cd eval
python inference.py \

This script will generate:

  • Driving action trajectories
  • Reasoning process associated with each trajectory

Citation

If you find this work useful, please consider citing:

@article{ma2025dvlm,
  title={dVLM-AD: Enhance Diffusion Vision-Language-Model for Driving via Controllable Reasoning},
  author={Ma, Yingzi and Cao, Yulong and Ding, Wenhao and Zhang, Shuibai and Wang, Yan and Ivanovic, Boris and Jiang, Ming and Pavone, Marco and Xiao, Chaowei},
  journal={arXiv preprint arXiv:2512.04459},
  year={2025}
}

About

Official Repo for “dVLM-AD: Enhance Diffusion Vision-Language-Model for Driving via Controllable Reasoning”

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •