CAPO: Confidence Aware Preference Optimization for Multilingual Preferences

Paper: CAPO: Confidence Aware Preference Optimization Learning for Multilingual Preferences, Accepted at IJCNLP-AACL 2025 Findings

This repository contains the official implementation of CAPO (Confidence Aware Preference Optimization). CAPO is a post-training alignment method designed to improve the robustness of LLMs in multilingual settings. Unlike standard DPO, CAPO utilizes a dynamic loss scaling mechanism based on relative reward margins to handle noisy or low-margin comparisons often found in multilingual translation tasks.

Abstract

Preference optimization is a critical post-training technique used to align large language models (LLMs) with human preferences, typically by fine-tuning on ranked response pairs. While methods like Direct Preference Optimization (DPO) have proven effective in English, they often fail to generalize robustly to multilingual settings. We propose a simple yet effective alternative, Confidence-Aware Preference Optimization (CAPO), which replaces DPO's fixed treatment of preference pairs with a dynamic loss scaling mechanism based on a relative reward. By modulating the learning signal according to the confidence in each preference pair, CAPO enhances robustness to noisy or low-margin comparisons, typically encountered in multilingual text. Empirically, CAPO outperforms existing preference optimization baselines by at least 16% in reward accuracy, and improves alignment by widening the gap between preferred and dispreferred responses across languages.

Todo

Add CAPO objective to Huggingface

Installation

Clone the repository.
Install dependencies.
```
pip install -r requirements.txt
```

Data Preparation

The training pipeline combines two datasets: DIVEMT and MLQE-PE. Or you can use your owqn data.

DIVEMT

No action needed. The script automatically downloads this from the Hugging Face Hub.
MLQE-PE (Manual Setup)

You must place your MLQE-PE dataset files in the Data/mlqepe/ directory. The data loader expects the following three files:

Data/mlqepe/mlqepe_train.csv

Data/mlqepe/mlqepe_test.csv

Data/mlqepe/mlqepe_dev.csv

CSV Format: The CSVs should contain (at minimum) the following columns:

source: The source sentence.

pe: The post-edited (chosen/preferred) translation.

mt: The machine translation (rejected/dispreferred).

direction: The language direction (e.g., eng-deu, eng-zh).

Training

You can run training using a YAML configuration file or by passing arguments via the command line.

Option 1: Using Config File

Modify configs/capo_config.yaml to set your desired hyperparameters, then run:

python train.py configs/capo_config.yaml

Option 2: Command Line Arguments

python train.py \
    --model_name_or_path "meta-llama/Llama-3.1-8B-Instruct" \
    --output_dir "./capo-manual-run" \
    --learning_rate 2e-6 \
    --num_train_epochs 3

Citation

If you find our research helpful, please cite:

@article{pokharel2025capo,
  title={CAPO: Confidence Aware Preference Optimization Learning for Multilingual Preferences},
  author={Pokharel, Rhitabrat and Tao, Yufei and Agrawal, Ameeta},
  journal={arXiv preprint arXiv:2511.07691},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Data/mlqepe		Data/mlqepe
configs		configs
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CAPO: Confidence Aware Preference Optimization for Multilingual Preferences

Abstract

Todo

Installation

Data Preparation

Training

Option 1: Using Config File

Option 2: Command Line Arguments

Citation

About

Uh oh!

Languages

PortNLP/CAPO

Folders and files

Latest commit

History

Repository files navigation

CAPO: Confidence Aware Preference Optimization for Multilingual Preferences

Abstract

Todo

Installation

Data Preparation

Training

Option 1: Using Config File

Option 2: Command Line Arguments

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Languages