DreamID-V: Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer
🌐 Project Page | 📜 Arxiv | 🤗 Models |
DreamID-V: Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer
Xu Guo * , Fulong Ye * , Xinghui Li *, Pengqi Tu, Pengze Zhang, Qichao Sun, Songtao Zhao †, Xiangwang Hou † Qian He
* Equal contribution, † Corresponding author
Tsinghua University | Intelligent Creation Team, ByteDance
- [01/08/2026] 🔥 Thanks HM-RunningHub for supporting ComfyUI!
- [01/06/2026] 🔥 Our paper is released!
- [01/05/2026] 🔥 Our code is released!
- [12/17/2025] 🔥 Our project is released!
- [08/11/2025] 🎉 Our image version DreamID is accepted by SIGGRAPH Asia 2025!
- Reference Image Preparation: Please upload cropped face images (recommended resolution: 512x512) as reference. Avoid using full-body photos to ensure optimal identity preservation.
- Inference Steps: For simple scenes, you can reduce the sampling steps to 20 to significantly decrease inference time.
Note: Our internal model based on Seedance1.0 achieves high quality in under 8 steps. Feel free to experience it at CapCut.
- Best Quality: For the highest fidelity results, we recommend using a resolution of 1280x720.
- Known Issue (Pose Detection): You may encounter the error
no pose detected in the reference videodue to limitations in the current pose extractor. We are actively working on integrating a more robust solution. Pull Requests are highly welcome!
| Models | Download Link | Notes |
|---|---|---|
| DreamID-V | 🤗 Huggingface | Supports 480P & 720P |
| Wan-2.1 | 🤗 Huggingface | VAE & Text encoder |
Install dependencies:
# Ensure torch >= 2.4.0
pip install -r requirements.txt- Single-GPU inference
python generate_dreamidv.py \
--size 832*480 \
--ckpt_dir wan2.1-1.3B path \
--dreamidv_ckpt dreamidv.pth path \
--sample_steps 50 \
--base_seed 42- Multi-GPU inference using FSDP + xDiT USP
pip install "xfuser>=0.4.1"
torchrun --nproc_per_node=2 generate_dreamidv.py \
--size 832*480 \
--ckpt_dir wan2.1-1.3B path \
--dreamidv_ckpt dreamidv.pth path \
--sample_steps 50 \
--dit_fsdp \
--t5_fsdp \
--ulysses_size 2 \
--ring_size 1 \
--base_seed 42Our work builds upon and is greatly inspired by several outstanding open-source projects, including Wan2.1, Phantom, OpenHumanVid, Follow-Your-Emoji. We sincerely thank the authors and contributors of these projects for generously sharing their excellent codes and ideas.
If you have any comments or questions regarding this open-source project, please open a new issue or contact Xu Guo and Fulong Ye.
If you find our work helpful, please consider citing our paper and leaving valuable stars
@misc{guo2026dreamidvbridgingimagetovideogaphighfidelity,
title={DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer},
author={Xu Guo and Fulong Ye and Xinghui Li and Pengqi Tu and Pengze Zhang and Qichao Sun and Songtao Zhao and Xiangwang Hou and Qian He},
year={2026},
eprint={2601.01425},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2601.01425},
}
