This repository provides an implementation of the methodology described in our paper, "Streaming Generation of Co-Speech Gestures via Accelerated Rolling Diffusion". Our approach adapts standard diffusion-based generative models into streaming-compatible setups.
This repository includes streaming adaptations of three popular gesture-generation diffusion models:
Each model has its own dedicated folder with detailed setup and usage instructions.
Each adapted model supports two operation modes:
- Normal: Standard offline generation.
- Rolling: Modified streaming-compatible generation.
For each model:
- Navigate into the model’s directory.
- Edit the configuration file to set the mode (
rollingornormal). - Follow specific instructions provided within each model's directory for training and generation.
Below are examples for each adapted model.
Navigate to the DiffStyleGesture directory:
cd DiffStyleGestureEdit the YAML configuration files as follows:
- ZEGGS dataset:
main/mydiffusion_zeggs/configs/DiffuseStyleGesture.yaml - BEAT dataset:
BEAT-TWH-main/mydiffusion_beat_twh/configs/DiffuseStyleGesture.yml
Set the rolling mode:
schedule_sampler_type: rolling(Ensure (n_poses - seed_frames) divides evenly into the timesteps.)
-
ZEGGS:
bash scripts/train_rolling.sh
-
BEAT:
cd BEAT-TWH-main/mydiffusion_beat_twh python end2end.py --config=./configs/DiffuseStyleGesture.yml --gpu 0
-
Single audio sample:
python rolling_sample.py --config=./configs/DiffuseStyleGesture.yml --gpu 0 --model_path './model000450000.pt' --audiowavlm_path "./sample.wav"
-
RLDA method (ZEGGS only):
python rolling_sample.py --config=./configs/DiffuseStyleGesture.yml --gpu 0 --model_path './model000450000.pt' --audiowavlm_path "./sample.wav" --sample_type rlda
-
Batch audio folder:
python rolling_sample_folder.py --config=./configs/DiffuseStyleGesture.yml --gpu 0 --model_path './model000450000.pt' --test_wav_folder "./wav_folder" --result_bvh_folder "./outputs"
-
BEAT dataset generation:
cd ../mydiffusion_beat_twh python rolling_sample.py --config=./configs/DiffuseStyleGesture.yml --dataset BEAT --gpu 0 --model_path './BEAT_rolling_120/model001200000.pt' --wav_path ../data/tts.wav --txt_path ../data/tts_align_process.tsv --wavlm_path "path_to/WavLM-Large.pt" --word2vector_path "path_to/crawl-300d-2M.vec"
Navigate to Taming's directory:
cd DiffGestureEdit config/pose_diffusion_zeggs.yml:
type_: "rolling"(Ensure (n_poses - n_pre_poses) divides evenly into timesteps.)
python scripts/train_zeggs.py --config=config/pose_diffusion_zeggs.ymlpython scripts/test_zeggs.py(Supports both ZEGGS and BEAT datasets.)
Navigate to PersonaGestor's directory:
cd Persona_GestorEdit the experiment YAML files in configs_diffmotion/experiment/:
model:
type_: "rolling"(Ensure (model.seq_length - model.context_length) divides evenly into timesteps.)
python src/diffmotion/diffmotion_trainer/train_diffmotion.py -m experiment=xxx_WavLM_[base/large]_train.yamlpython src/diffmotion/diffmotion_trainer/train_diffmotion.py -m experiment=xxx_WavLM_[base/large]_generate.yamlIf you find this repository helpful in your research, please consider citing:
@article{vu2025streaming,
title={Streaming Generation of Co-Speech Gestures via Accelerated Rolling Diffusion},
author={Vu, Evgeniia and Boiarov, Andrei and Vetrov, Dmitry},
journal={arXiv preprint arXiv:2503.10488},
year={2025}
}This project builds upon excellent prior works and codebases:
We thank the authors of these repositories for open-sourcing their valuable contributions.