Skip to content

andrewbo29/co-speech-gestures-rolling-diffusion

Repository files navigation

Streaming Generation of Co-Speech Gestures via Accelerated Rolling Diffusion

This repository provides an implementation of the methodology described in our paper, "Streaming Generation of Co-Speech Gestures via Accelerated Rolling Diffusion". Our approach adapts standard diffusion-based generative models into streaming-compatible setups.


Models Included

This repository includes streaming adaptations of three popular gesture-generation diffusion models:

Each model has its own dedicated folder with detailed setup and usage instructions.


Getting Started

Each adapted model supports two operation modes:

  • Normal: Standard offline generation.
  • Rolling: Modified streaming-compatible generation.

General Usage Instructions

For each model:

  1. Navigate into the model’s directory.
  2. Edit the configuration file to set the mode (rolling or normal).
  3. Follow specific instructions provided within each model's directory for training and generation.

Below are examples for each adapted model.


DiffStyleGesture

Navigate to the DiffStyleGesture directory:

cd DiffStyleGesture

Configuration

Edit the YAML configuration files as follows:

  • ZEGGS dataset: main/mydiffusion_zeggs/configs/DiffuseStyleGesture.yaml
  • BEAT dataset: BEAT-TWH-main/mydiffusion_beat_twh/configs/DiffuseStyleGesture.yml

Set the rolling mode:

schedule_sampler_type: rolling

(Ensure (n_poses - seed_frames) divides evenly into the timesteps.)

Training

  • ZEGGS:

    bash scripts/train_rolling.sh
  • BEAT:

    cd BEAT-TWH-main/mydiffusion_beat_twh
    python end2end.py --config=./configs/DiffuseStyleGesture.yml --gpu 0

Generation

  • Single audio sample:

    python rolling_sample.py --config=./configs/DiffuseStyleGesture.yml --gpu 0 --model_path './model000450000.pt' --audiowavlm_path "./sample.wav"
  • RLDA method (ZEGGS only):

    python rolling_sample.py --config=./configs/DiffuseStyleGesture.yml --gpu 0 --model_path './model000450000.pt' --audiowavlm_path "./sample.wav" --sample_type rlda
  • Batch audio folder:

    python rolling_sample_folder.py --config=./configs/DiffuseStyleGesture.yml --gpu 0 --model_path './model000450000.pt' --test_wav_folder "./wav_folder" --result_bvh_folder "./outputs"
  • BEAT dataset generation:

    cd ../mydiffusion_beat_twh
    python rolling_sample.py --config=./configs/DiffuseStyleGesture.yml --dataset BEAT --gpu 0 --model_path './BEAT_rolling_120/model001200000.pt' --wav_path ../data/tts.wav --txt_path ../data/tts_align_process.tsv --wavlm_path "path_to/WavLM-Large.pt" --word2vector_path "path_to/crawl-300d-2M.vec"

Taming (DiffGesture)

Navigate to Taming's directory:

cd DiffGesture

Configuration

Edit config/pose_diffusion_zeggs.yml:

type_: "rolling"

(Ensure (n_poses - n_pre_poses) divides evenly into timesteps.)

Training

python scripts/train_zeggs.py --config=config/pose_diffusion_zeggs.yml

Generation

python scripts/test_zeggs.py

(Supports both ZEGGS and BEAT datasets.)


PersonaGestor

Navigate to PersonaGestor's directory:

cd Persona_Gestor

Configuration

Edit the experiment YAML files in configs_diffmotion/experiment/:

model:
  type_: "rolling"

(Ensure (model.seq_length - model.context_length) divides evenly into timesteps.)

Training

python src/diffmotion/diffmotion_trainer/train_diffmotion.py -m experiment=xxx_WavLM_[base/large]_train.yaml

Generation

python src/diffmotion/diffmotion_trainer/train_diffmotion.py -m experiment=xxx_WavLM_[base/large]_generate.yaml

Citation

If you find this repository helpful in your research, please consider citing:

@article{vu2025streaming,
  title={Streaming Generation of Co-Speech Gestures via Accelerated Rolling Diffusion},
  author={Vu, Evgeniia and Boiarov, Andrei and Vetrov, Dmitry},
  journal={arXiv preprint arXiv:2503.10488},
  year={2025}
}

Acknowledgments

This project builds upon excellent prior works and codebases:

We thank the authors of these repositories for open-sourcing their valuable contributions.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •