Skip to content

Comments

uniq: fix -w to count bytes in C locale#11061

Open
aguimaraes wants to merge 3 commits intouutils:mainfrom
aguimaraes:uniq-fix-w-locale-bytes
Open

uniq: fix -w to count bytes in C locale#11061
aguimaraes wants to merge 3 commits intouutils:mainfrom
aguimaraes:uniq-fix-w-locale-bytes

Conversation

@aguimaraes
Copy link
Contributor

@aguimaraes aguimaraes commented Feb 23, 2026

Summary

uniq -w N should count bytes in C/POSIX locale and characters in UTF-8 locale. Currently it always counts UTF-8 characters regardless of locale.

Changes

  • Added is_c_locale() helper that checks LC_ALL, LC_CTYPE, LANG in order
  • Modified key_end_index() to use byte counting when in C locale
  • Added test for C locale byte counting behavior
  • Fixed test_stdin_w1_multibyte to explicitly set UTF-8 locale (it was implicitly relying on character counting)

Considerations

I chose to inline the locale check (~9 lines) rather than adding the i18n feature dependency. The check is simple enough that duplicating it seemed better than pulling in ICU dependencies just for this.

If you'd prefer I use uucore::i18n instead, let me know and I'll update.

Fixes #10184

@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/pr/bounded-memory (passes in this run but fails in the 'main' branch)

@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/pr/bounded-memory (passes in this run but fails in the 'main' branch)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

uniq: -w counts UTF-8 characters instead of bytes

2 participants