GitHub - DongJiashu/DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism: explore phoneme-mapped cross-lingual transfer learning using minimal target language data (English to German)

About this Repository

This repository is part of my master's thesis project. It is based on the official OpenVPI DiffSinger implementation.

In addition to reproducing and adapting the core DiffSinger model, this repo includes all scripts and resources used throughout my research pipeline. These include external tools, dependent repositories, and custom scripts located in the user_script/ directory. While the main singing voice synthesis model comes from OpenVPI's DiffSinger, the end-to-end workflow and data processing were extended to suit the needs of my thesis experiments.

Thesis Topic

This work explores phoneme-mapped cross-lingual transfer learning for singing voice synthesis (SVS), focusing on adapting an English-trained DiffSinger model to German using minimal target-language data. We focus on the acoustic model (not the variance model), and investigate how data quality—particularly accent, vocal range, and recording conditions—impacts low-resource SVS performance. Thesis can be found here.

Installation

Please follow the installation and dependency setup as described in the original DiffSinger repository. This fork maintains compatibility with the upstream environment and training pipeline.

Workflow Overview

The experimental pipeline includes the following key stages, with associated scripts and tools:

1. Audio Segmentation

Extract audio: mp4_to_wav
Clean audio: fishaudio preprocess tools
Auto-slice: AudioSlicer
Manual adjustment (optional): slice_audio, trim_audio

2. Lyrics Annotation

Automatic transcription: Whisper via fishaudio
Manual annotation (optional): lyrics_to_lab, check_lab

3. Corpus Construction (optional)

Convert GT-Singer format to DiffSinger format: convert, cleanup
Select wavs by target duration: filter_by_duration
Calculate total corpus length: calculate_duration
Clean up folder: clean up folder

4. Phonetic Dictionary Update

Check and fill missing words in lexicon: check_lexicon

5. Phoneme Alignment (MFA)

Automatic Alignment using Montreal Forced Aligner
Manual Alignmeny using Vlabeler

6. Phoneme Mapping (Cross-lingual Transfer)

Phoneme-to-phoneme mapping via IPA & PHOIBLE: phoneme_mapping

7. Inference Stimuli Preparation

ph num english: colstone/ENG_dur_num
ph num german switch this file: dur_num_dict.txt
Note sequence: OpenVPI/SOME
f0 and time-step: OpenVPI MakeDiffSinger
Combine multiple ds files (optional): combine_ds.py

8. Objective Evaluation

FFE & MCD: user_script/06_objective_evaluation/FFE&MCD/
Intelligibility transcription (Whisper): fishaudio transcribe
Word Error Rate (WER): run_wer_eval.py

References

Original Paper & Implementation

Paper: DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
Implementation: OpenVPI/DiffSinger

Generative Models & Algorithms

Denoising Diffusion Probabilistic Models (DDPM): paper, implementation
- DDIM for diffusion sampling acceleration
- PNDM for diffusion sampling acceleration
- DPM-Solver++ for diffusion sampling acceleration
- UniPC for diffusion sampling acceleration
Rectified Flow (RF): paper, implementation

Dependencies & Submodules

RoPE for transformer encoder
HiFi-GAN and NSF for waveform reconstruction
pc-ddsp for waveform reconstruction
RMVPE and yxlllc's fork for pitch extraction
Vocal Remover and yxlllc's fork for harmonic-noise separation

External Tools and Related Repositories

The following repositories are used as part of the data preparation and evaluation pipeline described in the Workflow Overview:

OpenVPI/AudioSlicer – Automatic audio slicing
OpenVPI/MakeDiffSinger – Data preprocessing utilities
OpenVPI/SOME – Note duration extraction
fishaudio/audio-preprocess – Audio cleaning and Whisper-based lyric transcription
PHOIBLE - Phoible phonological feature database
Montreal Forced Aligner (MFA) – Phoneme-level alignment
Vlabeler -manual phoneme-level alignment
colstone/ENG_dur_num – Duration-number mapping utilities
GTSinger - Dataset

Disclaimer

Any organization or individual is prohibited from using any functionalities included in this repository to generate someone's speech without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws.

License

This forked DiffSinger repository is licensed under the Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 1,156 Commits
augmentation		augmentation
basics		basics
checkpoints		checkpoints
configs		configs
data		data
deployment		deployment
dictionaries		dictionaries
docs		docs
inference		inference
modules		modules
preprocessing		preprocessing
samples		samples
scripts		scripts
training		training
user_script		user_script
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements-onnx.txt		requirements-onnx.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About this Repository

Thesis Topic

Installation

Workflow Overview

1. Audio Segmentation

2. Lyrics Annotation

3. Corpus Construction (optional)

4. Phonetic Dictionary Update

5. Phoneme Alignment (MFA)

6. Phoneme Mapping (Cross-lingual Transfer)

7. Inference Stimuli Preparation

8. Objective Evaluation

References

Original Paper & Implementation

Generative Models & Algorithms

Dependencies & Submodules

External Tools and Related Repositories

Disclaimer

License

About

Uh oh!

Releases

Packages

Languages

License

DongJiashu/DiffSinger

Folders and files

Latest commit

History

Repository files navigation

About this Repository

Thesis Topic

Installation

Workflow Overview

1. Audio Segmentation

2. Lyrics Annotation

3. Corpus Construction (optional)

4. Phonetic Dictionary Update

5. Phoneme Alignment (MFA)

6. Phoneme Mapping (Cross-lingual Transfer)

7. Inference Stimuli Preparation

8. Objective Evaluation

References

Original Paper & Implementation

Generative Models & Algorithms

Dependencies & Submodules

External Tools and Related Repositories

Disclaimer

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages