Skip to content

Broken error messages in token normalization—{token!r} shown literally instead of actual token value (#579)#580

Open
markknoffler wants to merge 1 commit intogoogle-deepmind:mainfrom
markknoffler:fix/fix-fstring-token-bug
Open

Broken error messages in token normalization—{token!r} shown literally instead of actual token value (#579)#580
markknoffler wants to merge 1 commit intogoogle-deepmind:mainfrom
markknoffler:fix/fix-fstring-token-bug

Conversation

@markknoffler
Copy link

Fix for Issue #579

Root cause

In _normalize_token, when a stop_token or forbidden_token string maps to multiple token IDs, the code raises a ValueError using a plain string instead of an f-string:

raise ValueError(
    'Invalid token: {token!r}. `stop_token`s and `forbidden_token`s must'
    ' map to single token ids in the vocab.'
)

Because there is no f prefix, Python treats {token!r} as literal text. Users see:

ValueError: Invalid token: {token!r}. `stop_token`s and `forbidden_token`s must map to single token ids in the vocab.

instead of the actual invalid token value, making debugging harder.

Fix summary

Use an f-string so {token!r} is interpolated and the real token value appears in the error message. Apply the same change in both affected files.

Patch sketch

1. gemma/gm/text/_sampler.py

raise ValueError(
    f'Invalid token: {token!r}. `stop_token`s and `forbidden_token`s must'
    ' map to single token ids in the vocab.'
)

2. gemma/research/t5gemma/sampling.py

raise ValueError(
    f'Invalid token: {token!r}. `stop_token`s and `forbidden_token`s must'
    ' map to single token ids in the vocab.'
)
Screenshot 2026-02-19 at 9 01 44 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant