Skip to content

Conversation

@Observer-GGboy
Copy link

@Observer-GGboy Observer-GGboy commented Jan 25, 2026

Summary

Fix CJK (Chinese, Japanese, Korean) character display issues in Windows terminals by adding proper UTF-8 encoding defaults.

Problem

Windows terminals don't have UTF-8 encoding enabled by default, causing CJK characters to display as gibberish or question marks. This affects:

  • Terminal output with CJK characters
  • Python scripts with CJK strings
  • Git operations with CJK filenames
  • Any CLI tool outputting CJK text

Solution

This PR adds platform-specific UTF-8 defaults for Windows:

Environment Variables

  • LANG=en_US.UTF-8 - Sets locale for Unix-style tools
  • LC_ALL=en_US.UTF-8 - Overrides all locale categories
  • PYTHONUTF8=1 - Enables UTF-8 mode for Python 3.7+
  • PYTHONIOENCODING=utf-8 - Sets Python stdin/stdout/stderr encoding

Shell Startup Arguments

  • CMD.exe: Runs chcp 65001 to set code page to UTF-8
  • PowerShell: Sets OutputEncoding and InputEncoding to UTF-8

Testing

Tested on Windows with:

  • ✅ Terminal displaying Chinese characters correctly
  • ✅ Python scripts printing CJK text without errors
  • ✅ Git operations with CJK filenames
  • ✅ CMD and PowerShell shells

Impact

  • Platform: Windows only (no impact on macOS/Linux)
  • Backwards Compatible: Yes (only sets defaults, can be overridden via env parameter)
  • User-Visible: Yes (fixes CJK display issues)

Related Issues

This PR addresses the root cause of several Windows encoding issues:

Closes #10491

Replace placeholder path with real typings so typecheck passes on Windows.
Replace placeholder path with real typings so typecheck passes on Windows.
Add platform-specific UTF-8 environment variables and shell startup args for better CJK character support on Windows.

- Set LANG and LC_ALL to en_US.UTF-8
- Set PYTHONUTF8 and PYTHONIOENCODING for Python scripts
- Add chcp 65001 for CMD.exe
- Configure PowerShell UTF-8 encoding
@github-actions
Copy link
Contributor

Thanks for your contribution!

This PR doesn't have a linked issue. All PRs must reference an existing issue.

Please:

  1. Open an issue describing the bug/feature (if one doesn't exist)
  2. Add Fixes #<number> or Closes #<number> to this PR description

See CONTRIBUTING.md for details.

@github-actions
Copy link
Contributor

The following comment was made by an LLM, it may be inaccurate:

Based on my search, I found one potentially related PR:

Potential Related PR

PR #10381: "fix: specify UTF-8 encoding for Buffer.toString() on Windows - Issue #10341"
#10381

Why it's related: This PR also addresses UTF-8 encoding issues on Windows. While PR #10489 focuses on PTY and environment variables for CJK character display, PR #10381 appears to address UTF-8 encoding at the Buffer level. Both are tackling Windows encoding problems and may have overlapping solutions or could benefit from coordination.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant