Fix ZeroDivisionError and NaN propagation in reward model metrics #3779

Mr-Neutr0n · 2026-02-11T18:38:41Z

Problem

The reward model metric functions in model/model_training/metrics.py have two related bugs that can cause training to crash or silently produce corrupt evaluation results:

ZeroDivisionError in kendall_tau() and spearmanr(): Both functions divide by bsize at the end without checking whether it is zero. When all labels in a batch are padding (-100), no label groups are found, bsize stays at 0, and the division raises ZeroDivisionError.
NaN propagation: When a label group contains fewer than 2 ranked items, scipy.stats.kendalltau and scipy.stats.spearmanr return NaN (correlation is undefined for single-element arrays). This NaN gets added to the running sum and silently corrupts the final metric value, making it NaN for the entire evaluation step.
Empty array in reward_accuracy(): If no valid label groups are found, pos_scores and neg_scores remain empty lists. Calling np.mean on an empty array produces a RuntimeWarning and returns NaN.

Fix

Skip label groups with fewer than 2 items in kendall_tau() and spearmanr().
Only accumulate results that are not NaN, and only increment bsize for valid results.
Return 0.0 instead of dividing by zero when bsize is 0.
Return zeroed metrics dict early in reward_accuracy() when no scores were collected.

Testing

These edge cases arise during reward model evaluation when:

A batch contains only padding tokens (all labels are -100)
A label group has only a single ranked item (e.g., only one response for a given prompt)

The fix ensures that metric computation completes without errors and returns deterministic fallback values (0.0) rather than crashing or returning NaN.

The kendall_tau() and spearmanr() functions divide by bsize without checking if it is zero. This causes a ZeroDivisionError when all labels are padding (-100) or when the input batch has no valid label groups. Additionally, when a label group has fewer than 2 ranked items, scipy's kendalltau/spearmanr return NaN, which silently propagates through the accumulated score and corrupts the final metric value. Changes: - Skip label groups with fewer than 2 items (correlation is undefined for single-element arrays) - Only increment bsize for groups that produce a valid (non-NaN) correlation result - Return 0.0 instead of dividing by zero when bsize is 0 - Guard reward_accuracy() against empty score arrays, which would cause np.mean to return NaN on an empty array

Mr-Neutr0n requested review from andreaskoepf, dvruette, jordiclive, sanagno, shahules786, theblackcat102 and yk as code owners February 11, 2026 18:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ZeroDivisionError and NaN propagation in reward model metrics #3779

Fix ZeroDivisionError and NaN propagation in reward model metrics #3779

Uh oh!

Mr-Neutr0n commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix ZeroDivisionError and NaN propagation in reward model metrics #3779

Are you sure you want to change the base?

Fix ZeroDivisionError and NaN propagation in reward model metrics #3779

Uh oh!

Conversation

Mr-Neutr0n commented Feb 11, 2026

Problem

Fix

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant