Skip to content

BLEU: improve input validation and document edge-case behavior) #729

@Vangmay

Description

@Vangmay

Hi! I noticed a few small, backward-compatible improvements that could clarify and harden the BLEU metric implementation.

  • Support a simple string alias for the default tokenizer (e.g. tokenizer="13a") in addition to passing a callable.
  • Add explicit validation for length mismatches between predictions and references.
  • Document and add tests for the current behavior when predictions are empty strings (BLEU evaluates to 0.0 implicitly today).

These changes don’t alter default behavior and aim to improve usability, robustness, and reproducibility.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions