dVLM-AD formulates autonomous driving decision-making as a conditional diffusion process over actions, enabling bidirectional context reasoning, improved robustness to uncertainty, and stronger reasoning–action consistency compared to autoregressive vision-language models.
For motivation, qualitative examples, and benchmark evaluations, please refer to the project website:
👉 https://dvlm-ad.github.io/
We recommend using conda to manage the environment.
conda create -n dvlm python=3.10 -y
bash init_env.shDownload the checkpoint and place it under:
checkpoints/
Checkpoint download links will be provided on the project website.
cd eval
python inference.py \This script will generate:
- Driving action trajectories
- Reasoning process associated with each trajectory
If you find this work useful, please consider citing:
@article{ma2025dvlm,
title={dVLM-AD: Enhance Diffusion Vision-Language-Model for Driving via Controllable Reasoning},
author={Ma, Yingzi and Cao, Yulong and Ding, Wenhao and Zhang, Shuibai and Wang, Yan and Ivanovic, Boris and Jiang, Ming and Pavone, Marco and Xiao, Chaowei},
journal={arXiv preprint arXiv:2512.04459},
year={2025}
}