Dynamic Text-to-Speech Cloning

AI.voice.cloning.Demo.mp4

This repository contains a dynamic text-to-speech cloning application built with Streamlit and the Bark TTS model. This project allows users to upload an audio file, clone the voice, and generate speech from the provided text input. The application is designed to be user-friendly and provides functionalities to upload audio, generate speech, and download the generated audio file.

Overview

The dynamic text-to-speech cloning application leverages the power of the Bark TTS model to synthesize speech from text input. Users can upload an audio file to clone the voice and generate speech in that voice. The application also allows users to download the generated audio file.

Features

Upload audio files (WAV, MP3) to clone the voice.
Convert text to speech using the cloned voice.
Download the generated speech as a WAV file.
Interactive and user-friendly Streamlit interface.

Installation

Clone the repository:

git clone https://github.com/Mercity-AI/Voice-Cloning-Demo.git
cd Voice-Cloning-Demo

Install the required packages:
```
pip install -r requirements.txt
```

Usage

Run the Streamlit application:
```
streamlit run app.py
```
Open your web browser and go to http://localhost:8501.
Upload an audio file, enter the text you want to convert to speech, and click "Generate Speech".
Download the generated speech by clicking the "Download Audio" button.

Explanation of Code

Building a voice cloning pipeline involves setting up a system that can take an audio input of a speaker's voice and generate new speech that mimics the same voice by matching with the text given by the user.

Importing Libraries

from TTS.tts.configs.bark_config import BarkConfig
from TTS.tts.models.bark import Bark
from scipy.io.wavfile import write as write_wav
import os

The TTS library is a powerful tool for text-to-speech conversion. It supports multiple TTS models, including Bark. These imports specifically bring in the configuration and model components required to set up and use the Bark TTS model.

SciPy is a scientific computing library in Python. Here, it is used to save the generated speech waveform to an audio file. The write_wav function writes a NumPy array to a WAV file, which is a common format for storing audio data.

The OS library provides a way to interact with the operating system. It is used for handling directory paths and file management.

Setting Up Configuration

config = BarkConfig()
model = Bark.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir="bark/", eval=True)

Initializes the configuration for the Bark model. This configuration includes various parameters that control the model's behavior during speech synthesis.
Initializes the Bark TTS model using the specified configuration. This sets up the model architecture and prepares it for loading pre-trained weights.
Loads the pre-trained weights for the Bark model from the specified checkpoint directory. This is crucial for ensuring the model has learned to generate high-quality speech based on extensive training data.

Speech Synthesis

text = "Mercity ai is a leading AI innovator in India, with OpenAI planning collaboration."
voice_dirs = "/Users/username/Desktop/projects/AI voice Cloning/Speaker voice/"

Defines the text that will be converted into speech. This is the input that the TTS model will process to generate the corresponding audio output.
Specifies the directory containing the speaker's audio files. These files are used to extract speaker-specific characteristics (embeddings) for voice cloning.

Synthesizing Speech

output_dict = model.synthesize(text, config, speaker_id='speaker', voice_dirs="bark_voices", temperature=0.95)

Uses the Bark model to synthesize speech from the input text. The method combines the text with the speaker-specific embeddings extracted from the audio files in the voice_dirs directory.

Parameters:

text: The text to be converted to speech.
config: The model configuration.
speaker_id: An identifier for the speaker (not deeply detailed here but typically used to select the appropriate speaker embedding).
voice_dirs: Directory containing the speaker's audio files.
temperature: A parameter that controls the randomness of the output. Lower values make the output more deterministic, while higher values introduce more variation.

Saving the Generated Speech

write_wav("SamAltman.wav", 24000, output_dict["wav"])

Saves the synthesized speech to a WAV file. The sample rate is set to 24,000 Hz.

Diagram

Screenshot

Voice Samples

To listen to the generated speech sample, click the play button below:

Sample-1: https://github.com/Mercity-AI/Voice-Cloning-Demo/assets/121884337/c44ec7ab-c91b-4888-9e5a-73b5860360ba

Sample-2:

WhatIF.mp4

To know more about it visit our page - https://www.mercity.ai/blog-post/how-to-build-real-time-voice-cloning-pipelines

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any improvements or suggestions.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
bark_voices/speaker		bark_voices/speaker
.gitignore		.gitignore
AI voice cloning Demo .mp4		AI voice cloning Demo .mp4
Clone.py		Clone.py
Generate.py		Generate.py
Output.wav		Output.wav
README.md		README.md
WhatIF.mp4		WhatIF.mp4
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dynamic Text-to-Speech Cloning

Table of Contents

Overview

Features

Installation

Usage

Explanation of Code

Importing Libraries

Setting Up Configuration

Speech Synthesis

Synthesizing Speech

Saving the Generated Speech

Diagram

Screenshot

Voice Samples

To know more about it visit our page - https://www.mercity.ai/blog-post/how-to-build-real-time-voice-cloning-pipelines

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Mercity-AI/Voice-Cloning-Demo

Folders and files

Latest commit

History

Repository files navigation

Dynamic Text-to-Speech Cloning

Table of Contents

Overview

Features

Installation

Usage

Explanation of Code

Importing Libraries

Setting Up Configuration

Speech Synthesis

Synthesizing Speech

Saving the Generated Speech

Diagram

Screenshot

Voice Samples

To know more about it visit our page - https://www.mercity.ai/blog-post/how-to-build-real-time-voice-cloning-pipelines

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages