Clarification on which PGSLM model is used in experiments

Hi, thank you for releasing this great work!

I have a question regarding the **PGSLM model** used in your experiments.

In the default configuration of the repo, the PGSLM tokenizer section specifies:

```json
"tokenizer": {
    "dense_model_name": "mhubert-base-25hz",
    "quantizer_model_name": "kmeans",
    "encoder_vocab_size": 500,
    "deduplicate": true,
    "need_f0": true,
    "f0_func": "parselmouth",
    "log_f0": true,
    "mean_f0": false,
    "scale_f0": false,
    "f0_bins_path": "/cs/labs/oabend/avishai.elma/pgslm_data_my_bins/f0_bins.pt"
}
```

However, the **official PGSLM release** from [[fairseq’s textless_nlp examples](https://github.com/facebookresearch/fairseq/tree/main/examples/textless_nlp/pgslm)](https://github.com/facebookresearch/fairseq/tree/main/examples/textless_nlp/pgslm) only includes a model with **100 units** and uses **HuBERT-base-ls960** as the dense model.

Could you please clarify:

* Which PGSLM model was actually used in your experiments?
* If it’s a custom model, could you share more details or a pointer to where it can be obtained?

Thank you very much for your time and for sharing your work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarification on which PGSLM model is used in experiments #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clarification on which PGSLM model is used in experiments #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions