Skip to content

Changes to data.sh file

Utkarsh edited this page Nov 29, 2023 · 1 revision

In local/data.sh, change dev and eval set divisions (line numbers 79-82):

Generally 50-100 sentences are kept aside as evaluation. 10% of the remaining data is validation.

For example, if you have totally 1050 sentences in the training data:

utils/subset_data_dir.sh --last data/train 150 data/deveval

utils/subset_data_dir.sh --last data/deveval 50 data/${eval_set}

utils/subset_data_dir.sh --first data/deveval 100 data/${train_dev}

 n=$(( $(wc -l < data/train/wav.scp) - 150 ))

utils/subset_data_dir.sh --first data/train ${n} data/${train_set}

Clone this wiki locally