-
Notifications
You must be signed in to change notification settings - Fork 2
Changes to data.sh file
Utkarsh edited this page Nov 29, 2023
·
1 revision
In local/data.sh, change dev and eval set divisions (line numbers 79-82):
Generally 50-100 sentences are kept aside as evaluation. 10% of the remaining data is validation.
For example, if you have totally 1050 sentences in the training data:
utils/subset_data_dir.sh --last data/train 150 data/deveval
utils/subset_data_dir.sh --last data/deveval 50 data/${eval_set}
utils/subset_data_dir.sh --first data/deveval 100 data/${train_dev}
n=$(( $(wc -l < data/train/wav.scp) - 150 ))
utils/subset_data_dir.sh --first data/train ${n} data/${train_set}