-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Add non-English language support to FinalResponseMatchV2Evaluator #3503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Enhanced the LLM-as-judge prompt to explicitly handle non-English languages including Chinese, Thai, Japanese, Korean, Arabic, Hebrew, Hindi, and other non-Latin scripts. The evaluator now: - Recognizes identical strings in any language as valid matches - Handles Unicode and character encoding differences - Accepts language-specific punctuation variations (e.g., 。 vs . in Chinese) - Treats all languages with equal evaluation standards Fixes google#3111 Fixes google#3162
Summary of ChangesHello @G26karthik, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is a great enhancement to the FinalResponseMatchV2Evaluator prompt, adding explicit support for non-English languages. The new instructions are comprehensive and should effectively address the reported issues with evaluating strings in languages like Thai and Chinese. I have one minor suggestion to further improve the clarity of the prompt for the LLM.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
Hi @G26karthik , Thank you for your contribution! We appreciate you taking the time to submit this pull request. |
Thanks for the update! |
|
Hi @G26karthik , Your PR has been received by the team and is currently under review. We will provide feedback as soon as we have an update to share. |
|
Hi @ankursharmas , can you please review this. LGTM. |
|
@ankursharmas |
Summary
Fixes #3111
Fixes #3162
Enhanced the
FinalResponseMatchV2EvaluatorLLM-as-judge prompt to explicitly support non-English languages, addressing evaluation failures for Thai, Chinese, and other non-Latin scripts.Problem
The evaluator was returning
score=0for identical strings in non-English languages (Thai, Chinese, Japanese, Korean, Arabic, etc.), even when the agent response and expected response were byte-for-byte identical. This occurred because the LLM judge was not explicitly instructed to handle Unicode characters and language-specific conventions.Solution
Enhanced the evaluation prompt with:
Changes
src/google/adk/evaluation/final_response_match_v2.py_FINAL_RESPONSE_MATCH_V2_PROMPTtemplate with i18n guidanceTesting Plan
This fix enhances the LLM-as-judge prompt with explicit i18n instructions. The prompt modification instructs the evaluator to properly handle non-English text.
Manual Testing:
Can be verified by reproducing the original issues:
score=1.0(previouslyscore=0.0)score=1.0(previouslyscore=0.0)Unit Tests:
Existing test suite in
tests/unittests/evaluation/test_final_response_match_v2.pyverifies the evaluator's core functionality. The prompt enhancement preserves existing English evaluation behavior while adding i18n support.Impact