Skip to content

maxmelichov/blue-codec

Repository files navigation

BlueCodec

Speech autoencoder that compresses 44.1 kHz audio into a compact 24-dimensional continuous latent representation at ~86 Hz, then reconstructs the waveform.

Installation

uv add "bluecodec @ git+https://github.com/maxmelichov/blue-codec.git"

Usage

See examples/ folder for a full working example.

Pretrained Model

The model was trained on 2x NVIDIA RTX 3090 GPUs for 4 weeks for 1.5 million steps on 6 million files with different languages, totaling about 11,000 hours of audio.

Pretrained weights: notmax123/blue-codec on Hugging Face.

Training

For detailed instructions on how to train the Autoencoder, please refer to the Training Documentation.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Fast and Small codec for flow-matching models

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages