Speech autoencoder that compresses 44.1 kHz audio into a compact 24-dimensional continuous latent representation at ~86 Hz, then reconstructs the waveform.
uv add "bluecodec @ git+https://github.com/maxmelichov/blue-codec.git"See examples/ folder for a full working example.
The model was trained on 2x NVIDIA RTX 3090 GPUs for 4 weeks for 1.5 million steps on 6 million files with different languages, totaling about 11,000 hours of audio.
Pretrained weights: notmax123/blue-codec on Hugging Face.
For detailed instructions on how to train the Autoencoder, please refer to the Training Documentation.
This project is licensed under the MIT License - see the LICENSE file for details.