🚗 dVLM-AD

Diffusion-based Vision-Language Models for Autonomous Driving

dVLM-AD formulates autonomous driving decision-making as a conditional diffusion process over actions, enabling bidirectional context reasoning, improved robustness to uncertainty, and stronger reasoning–action consistency compared to autoregressive vision-language models.

For motivation, qualitative examples, and benchmark evaluations, please refer to the project website:
👉 https://dvlm-ad.github.io/

Environment Setup

We recommend using conda to manage the environment.

Create and activate environment

conda create -n dvlm python=3.10 -y
bash init_env.sh

Running Inference

Prepare model checkpoint

Download the checkpoint and place it under:

checkpoints/

Checkpoint download links will be provided on the project website.

Run inference

cd eval
python inference.py \

This script will generate:

Driving action trajectories
Reasoning process associated with each trajectory

Citation

If you find this work useful, please consider citing:

@article{ma2025dvlm,
  title={dVLM-AD: Enhance Diffusion Vision-Language-Model for Driving via Controllable Reasoning},
  author={Ma, Yingzi and Cao, Yulong and Ding, Wenhao and Zhang, Shuibai and Wang, Yan and Ivanovic, Boris and Jiang, Ming and Pavone, Marco and Xiao, Chaowei},
  journal={arXiv preprint arXiv:2512.04459},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
eval		eval
train		train
.gitignore		.gitignore
README.md		README.md
init_env.sh		init_env.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚗 dVLM-AD

Diffusion-based Vision-Language Models for Autonomous Driving

Environment Setup

Create and activate environment

Running Inference

Prepare model checkpoint

Run inference

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

SaFo-Lab/dVLM-AD

Folders and files

Latest commit

History

Repository files navigation

🚗 dVLM-AD

Diffusion-based Vision-Language Models for Autonomous Driving

Environment Setup

Create and activate environment

Running Inference

Prepare model checkpoint

Run inference

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages