Skip to content

feat(core): optimize Windows character encoding detection#19061

Open
shorinversion wants to merge 2 commits intogoogle-gemini:mainfrom
shorinversion:feat/18533-optimize-windows-encoding
Open

feat(core): optimize Windows character encoding detection#19061
shorinversion wants to merge 2 commits intogoogle-gemini:mainfrom
shorinversion:feat/18533-optimize-windows-encoding

Conversation

@shorinversion
Copy link

Summary

This PR optimizes character encoding detection for shell tool outputs on Windows. It resolves the common issue where non-ASCII output (e.g., Cyrillic characters) appears garbled because the system's active code page was not correctly detected or prioritized over UTF-8.

Details

Implemented a robust 4-level encoding detection hierarchy:

  1. GEMINI_CLI_ENCODING Override: Users can now force a specific encoding via an environment variable.
  2. Strict UTF-8 Detection: Buffers are first attempted to be decoded as UTF-8 with a fatal check.
  3. Heuristic Analysis: Uses chardet for heuristic detection with a high confidence threshold (>= 90%).
  4. Improved Windows chcp Parsing: Fixed the regex to reliably capture numeric code pages on localized Windows systems.

This ensures that the CLI correctly interprets output regardless of whether the terminal is set to UTF-8 (65001), Cyrillic (866/1251), or other regional encodings.

Related Issues

Addresses Windows encoding issues (Related to #18533).

How to Validate

  1. On a Windows machine, run a command that outputs non-ASCII text (e.g., powershell -Command "echo 'Привет мир'").
  2. Observe that the model correctly understands and repeats the text instead of showing replacement characters.
  3. Run npm run test -w @google/gemini-cli-core -- src/utils/systemEncoding.test.ts to verify the 44 new unit tests.
  4. Optionally, set $env:GEMINI_CLI_ENCODING='utf-8' to verify the manual override.

Pre-Merge Checklist

  • Updated relevant documentation and README (if needed)
  • Added/updated tests (if needed)
  • Noted breaking changes (if any)
  • Validated on required platforms/methods:
    • MacOS (Logic verified via tests)
    • Windows
      • npm run
      • npx
    • Linux (Logic verified via tests)

@shorinversion shorinversion requested review from a team as code owners February 14, 2026 06:41
@google-cla
Copy link

google-cla bot commented Feb 14, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @shorinversion, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Gemini CLI's ability to correctly interpret shell command outputs on Windows, particularly for non-ASCII characters. It establishes a robust, multi-tiered encoding detection system, including user-definable overrides and improved system-level parsing, to prevent garbled text and ensure accurate display and processing of diverse character sets.

Highlights

  • Environment Variable Override: Introduced GEMINI_CLI_ENCODING environment variable for users to force a specific character encoding, allowing manual control over detection.
  • Strict UTF-8 Detection: Implemented an initial strict UTF-8 decoding attempt for output buffers, prioritizing correct UTF-8 interpretation.
  • Heuristic Encoding Analysis: Integrated chardet for heuristic detection with a high confidence threshold (>= 90%) to identify encodings when strict UTF-8 fails.
  • Improved Windows Code Page Parsing: Fixed the regex used to parse chcp output on Windows, ensuring reliable capture of numeric code pages, especially on localized systems.
Changelog
  • docs/troubleshooting.md
    • Added a new troubleshooting section detailing Windows encoding issues and how to use the GEMINI_CLI_ENCODING override.
  • package-lock.json
    • Updated dependency metadata by adding peer: true to several packages, reflecting changes in peer dependency declarations.
  • packages/core/src/utils/systemEncoding.test.ts
    • Added chardetAnalyse import and mock.
    • Included GEMINI_CLI_ENCODING in environment variable cleanup for tests.
    • Introduced numerous new unit tests to validate the multi-level encoding detection logic, including strict UTF-8, heuristic analysis, and environment variable overrides.
  • packages/core/src/utils/systemEncoding.ts
    • Imported chardetAnalyse for heuristic encoding detection.
    • Implemented a 4-level encoding detection hierarchy in getCachedEncodingForBuffer, incorporating environment variable override, strict UTF-8 check, and chardet heuristic analysis.
    • Updated the regex for parsing chcp output on Windows to improve reliability and added logging for non-UTF-8 code pages.
Activity
  • No human activity was detected in the provided context.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-cli gemini-cli bot added the area/core Issues related to User Interface, OS Support, Core Functionality label Feb 14, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a more robust character encoding detection mechanism, particularly for Windows systems, to prevent garbled output from shell commands. It implements a 4-level detection hierarchy that prioritizes a new GEMINI_CLI_ENCODING environment variable, followed by strict and heuristic UTF-8 checks, before falling back to system encoding detection. The logic for parsing the Windows chcp command has also been improved to be more reliable on localized systems. The changes are well-tested with new unit tests covering the new detection logic. My main feedback is to refactor some duplicated code to improve maintainability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/core Issues related to User Interface, OS Support, Core Functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant