LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec

This is the inference checkpoint and code of LSCodec.

Environment

Our code is tested on Python 3.10. Please use the requirements.txt:

conda create -n lscodec python=3.10
conda activate lscodec
pip install -r requirements.txt

Or, if your prefer Docker, you can directly use the image from vec2wav 2.0:

docker pull cantabilekwok511/vec2wav2.0:v0.2
docker run -it -v /path/to/vec2wav2.0:/workspace cantabilekwok511/vec2wav2.0:v0.2

Checkpoints

Checkpoints can be downloaded from HuggingFace or Modelscope.

We have two versions of LSCodec: 50Hz and 25Hz. You can use this script to automatically download them:

bash download_ckpt.sh 50hz
# or bash download_ckpt.sh 25hz

This will create pretrained/ (or pretrained_25hz, respectively) and download the following files:

codebook.npy: the codebook, (1, 300, 64) for LSCodec-50Hz; (1, 1024, 64) for LSCodec-25Hz.
encoder_config.yml, vocoder_config.yml: configs for the encoder and vocoder, respectively.
lscodec_encoder.pt, lscodec_vocoder.pt: checkpoints for the encoder and vocoder, respectively.

This downloading script will also prompt you to download the WavLM checkpoint manually. Please put this file under the pretrained model directory as well. If you have a WavLM checkpoint downloaded already, you can also ln -s it.

WavLM-Large.pt: WavLM-Large checkpoint from the official repo.

Encoding Waveform to Tokens

This codebase uses kaldiio to load and store data. Firstly, please prepare a wav.scp file containing the wav files:

utt-1 /path/to/utt_1.wav
utt-2 /path/to/utt_2.wav
...

You can also refer to example/wav.scp for example.

Then, encoding can be done by

source path.sh
encode.py --wav-scp example/wav.scp \
          --outdir example/tokens/ \
          --pretrained-dir pretrained/
# specify pretrained_25hz for 25Hz version.

where the tokens are stored in example/tokens/feats.ark and feats.scp. The feats.scp should look like:

3570_5694_000009_000002 /path/to/example/tokens/feats.ark:24
8455_210777_000079_000002 /path/to/example/tokens/feats.ark:677

You can also look into lscodec/bin/encode.py if you want to save into different formats.

Vocoding with Reference Prompts

Once encoded, LSCodec tokens can be vocoded into 24kHz waveforms using

source path.sh
decode_wav_prompt.py --feats-scp example/tokens/feats.scp \
    --prompt-wav-scp example/prompt.scp \
    --outdir example/wav \
    --pretrained-dir pretrained/
# specify pretrained_25hz for 25Hz version.

where --prompt-wav-scp prompt.scp specifies the prompt wav for each utterance's token sequence. This prompt.scp looks like:

utt-1 /path/to/reference_utt_1.wav
utt-2 /path/to/reference_utt_2.wav

Finally, the decoded waveforms can be found in example/wav.

Combining Encoding and Vocoding into One Step

If you want to use one script for the encoding and vocoding process together, consider:

source path.sh
recon_with_prompt.py --wav-scp example/wav.scp \
    --prompt-wav-scp example/prompt.scp \
    --outdir example/wav \
    --pretrained-dir pretrained/
# specify pretrained_25hz for 25Hz version.

Citation

@inproceedings{guo25_interspeech,
  title     = {{LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec}},
  author    = {Yiwei Guo and Zhihan Li and Chenpeng Du and Hankun Wang and Xie Chen and Kai Yu},
  year      = {2025},
  booktitle = {{Interspeech 2025}},
  pages     = {5018--5022},
  doi       = {10.21437/Interspeech.2025-1106},
  issn      = {2958-1796},
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
example		example
local		local
lscodec		lscodec
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download_ckpt.sh		download_ckpt.sh
path.sh		path.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec

Environment

Checkpoints

Encoding Waveform to Tokens

Vocoding with Reference Prompts

Combining Encoding and Vocoding into One Step

Citation

About

Uh oh!

Releases

Packages

Languages

License

X-LANCE/LSCodec-Inference

Folders and files

Latest commit

History

Repository files navigation

LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec

Environment

Checkpoints

Encoding Waveform to Tokens

Vocoding with Reference Prompts

Combining Encoding and Vocoding into One Step

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages