Skip to content

Official PyTorch implementation of (ICME2025) "AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis"

Notifications You must be signed in to change notification settings

Chengyuann/AutoStyle-TTS

Repository files navigation

AutoStyle-TTS PyTorch License: MIT

[📝 Paper] | ​ [🤖 Model Zoo] | ​ [📊 Dataset] | ​**[🛠️ Issues]**

Official PyTorch implementation of "AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis"
[Project Page][Colab]

✨ Key Features

  • Dynamic Style Matching
    🎯 Multi-modal retrieval with Llama/PER-LLM-Embedder/Moka embeddings
    🔍 Context-aware style selection from 1000+ curated speech samples

  • High-Quality Synthesis
    🎧 24kHz neural vocoder with prosody transfer
    🌐 Support English/Chinese/Japanese multilingual synthesis

  • Developer Friendly
    🚀 <5s inference time on single GPU
    📦 Pre-trained models & style database available

🛠️ Installation

# Clone with voice samples (2.5GB)
git clone https://github.com/Chengyuann/AutoStyle-TTS.git
cd AutoStyle-TTS

# Install core dependencies
conda create -n asttts python=3.9
conda activate asttts
pip install -r requirements.txt

#predata process

utils/
# train
python scripts/train.sh

🏋️ Pretrained Weights Model Download Link(https://cloud.tsinghua.edu.cn/d/a926bbefddfe40fbb041/)

⚠️ License This implementation is for research purposes only. Commercial use requires written permission. See LICENSE for details.

References LLaMA: GitHub Qwen2.5: Technical Report

About

Official PyTorch implementation of (ICME2025) "AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published