Skip to content

Singing Voice Synthesis via Shallow Diffusion Mechanism: explore phoneme-mapped cross-lingual transfer learning using minimal target language data (English to German)

License

Notifications You must be signed in to change notification settings

DongJiashu/DiffSinger

 
 

Repository files navigation

About this Repository

This repository is part of my master's thesis project. It is based on the official OpenVPI DiffSinger implementation.

In addition to reproducing and adapting the core DiffSinger model, this repo includes all scripts and resources used throughout my research pipeline. These include external tools, dependent repositories, and custom scripts located in the user_script/ directory. While the main singing voice synthesis model comes from OpenVPI's DiffSinger, the end-to-end workflow and data processing were extended to suit the needs of my thesis experiments.

Thesis Topic

This work explores phoneme-mapped cross-lingual transfer learning for singing voice synthesis (SVS), focusing on adapting an English-trained DiffSinger model to German using minimal target-language data. We focus on the acoustic model (not the variance model), and investigate how data quality—particularly accent, vocal range, and recording conditions—impacts low-resource SVS performance. Thesis can be found here.

Installation

Please follow the installation and dependency setup as described in the original DiffSinger repository. This fork maintains compatibility with the upstream environment and training pipeline.

Workflow Overview

The experimental pipeline includes the following key stages, with associated scripts and tools:

1. Audio Segmentation

2. Lyrics Annotation

3. Corpus Construction (optional)

4. Phonetic Dictionary Update

5. Phoneme Alignment (MFA)

6. Phoneme Mapping (Cross-lingual Transfer)

7. Inference Stimuli Preparation

8. Objective Evaluation

References

Original Paper & Implementation

Generative Models & Algorithms

Dependencies & Submodules

External Tools and Related Repositories

The following repositories are used as part of the data preparation and evaluation pipeline described in the Workflow Overview:

Disclaimer

Any organization or individual is prohibited from using any functionalities included in this repository to generate someone's speech without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws.

License

This forked DiffSinger repository is licensed under the Apache 2.0 License.

About

Singing Voice Synthesis via Shallow Diffusion Mechanism: explore phoneme-mapped cross-lingual transfer learning using minimal target language data (English to German)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%