feat(core): optimize Windows character encoding detection#19061
feat(core): optimize Windows character encoding detection#19061shorinversion wants to merge 2 commits intogoogle-gemini:mainfrom
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Summary of ChangesHello @shorinversion, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Gemini CLI's ability to correctly interpret shell command outputs on Windows, particularly for non-ASCII characters. It establishes a robust, multi-tiered encoding detection system, including user-definable overrides and improved system-level parsing, to prevent garbled text and ensure accurate display and processing of diverse character sets. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a more robust character encoding detection mechanism, particularly for Windows systems, to prevent garbled output from shell commands. It implements a 4-level detection hierarchy that prioritizes a new GEMINI_CLI_ENCODING environment variable, followed by strict and heuristic UTF-8 checks, before falling back to system encoding detection. The logic for parsing the Windows chcp command has also been improved to be more reliable on localized systems. The changes are well-tested with new unit tests covering the new detection logic. My main feedback is to refactor some duplicated code to improve maintainability.
Summary
This PR optimizes character encoding detection for shell tool outputs on Windows. It resolves the common issue where non-ASCII output (e.g., Cyrillic characters) appears garbled because the system's active code page was not correctly detected or prioritized over UTF-8.
Details
Implemented a robust 4-level encoding detection hierarchy:
chardetfor heuristic detection with a high confidence threshold (>= 90%).chcpParsing: Fixed the regex to reliably capture numeric code pages on localized Windows systems.This ensures that the CLI correctly interprets output regardless of whether the terminal is set to UTF-8 (65001), Cyrillic (866/1251), or other regional encodings.
Related Issues
Addresses Windows encoding issues (Related to #18533).
How to Validate
powershell -Command "echo 'Привет мир'").npm run test -w @google/gemini-cli-core -- src/utils/systemEncoding.test.tsto verify the 44 new unit tests.$env:GEMINI_CLI_ENCODING='utf-8'to verify the manual override.Pre-Merge Checklist