ParaRater is a data selection method that enhances cross-lingual transfer by selecting the most valuable parallel pairs, forming high-impact parallel corpora with two meta-learned raters.
This is the meta-learning toolkit repository for training raters.
Use Parquet files with the following columns:
text(required): stringcategory(optional): string label used only for logging per-category losses
accelerate launch --num_processes 7 pararater.py \
--trainset_path /path_to/train \
--valset_path /path_to/val