Skip to content

Conversation

@shijinpjlab
Copy link
Collaborator

No description provided.

pekopoke and others added 30 commits October 29, 2025 17:49
* 📚 Auto-update metrics documentation

* add OCR prompt

* 📚 Auto-update metrics documentation

* fix pylint

* Update document_parsing_quality_ocr_train.py

* add new ocr prompt

---------

Co-authored-by: GitHub Action <[email protected]>
Co-authored-by: quyuan <[email protected]>
feat: add 5 RAG eval metrics
* feat: temp

* feat: merge_result_info 移动位置

* feat: rule已支持

* feat: fix bug

* feat: prompt临时code

* feat: field由list改成dict,跑通rule

* feat: 改名map_data

* feat: prompt与llm合并

* feat: 批量合并

* feat: delete evaldata

* feat: 改名evalpipline

* feat: 调整map_data

* feat: 合并evaluate_rule和evaluate_prompt

* feat: 并发v3

* feat: 合并evaluate_single_data与evaluate_by_type

* feat: 合并execute与evaluate

* feat: 修复bug并发导致的配置覆盖

* feat: 调整位置

* feat: 修改local文件,适配新版result_info和modelres的error_type(summary模块待更新)

* feat: summary模块

* feat: error_type的value由reason列表改为dict,包含2个key:metric、reason

* feat: update

* feat: 添加ResTypeInfo类

* feat: rule_common.py更新返回

* feat: 4个rule文件更新返回

* feat: llm更新(除了type是列表)

* feat: 移动位置

* feat: 移动位置引发的import修改

* feat: error_type删除一层

* feat: result_save.good判断逻辑

* feat: update

* feat: rule_common.py更新res,添加label

* feat: 更新res,添加label

* feat: 更新res,添加label

* feat: fix lint

* feat: 4中base convertor

* feat: plaintext情况

* feat: plaintext save

* feat: json修复

* feat: jsonl修复

* feat: listjson修复

* feat: hf_plaintext.json 修复

* feat: hf_json 修复

* feat: hf_jsonl 修复

* feat: hf_listjson 修复

* feat: 修复bug 多规则结果异常

* feat: 修复bug 多规则结果异常

* feat: custom config rule 修复

* feat: fix test_local.py

* feat: fix test_local.py

* feat: fix test_continue.py

* feat: fix test_write.py 修复复杂rule

* feat: fix test_rule_common.py

* feat: ImageConverter

* feat: fix lint

* feat: label是数组的情况

* feat: 文件夹名

* feat: example更新

* feat: 删除特殊prompt

* feat: 删除prompt类

* feat: fix lint

* feat: ModelRes优化赋值,Model删除prompt相关

* feat: fix lint

* feat: ignore

* feat: ModelRes固定字段

* update res b_box overlap and visual rule

* update res b_box overlap and visual rule

* feat: spark的evaluate完成

* feat: spark的summarize更新

* feat: fix bug

* feat: 更新model

* feat: fix lint

* feat: TestModelRes

* feat: chupei 特殊场景

* feat: fix bug

* update res b_box overlap and visual rule

* feat: delete old convertor

* feat: 优化local,改prompt为llm

* feat: 添加sql来源

* feat: fix lint

* feat: LLMHtmlExtractCompareEn

* feat: fix lint

* feat: change name

* feat: 删除DatasetArgs中fields功能

* feat: fix bug plaintext

* feat: fix lint

* feat: test ignore rag

---------

Co-authored-by: pekopoke <[email protected]>
e06084 and others added 27 commits December 23, 2025 18:19
* feat: add Instruction Quality Evaluation

* feat: add examples in metrics

* 📚 Auto-update metrics documentation

---------

Co-authored-by: GitHub Action <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* feat: use_browser

* feat: fix gemini

* feat: lint
…-tests

Feat/update all examples and tests
* feat: init agent&tool architecture

* feat: agent&tool docs/tests/examples

* fix bugs
docs: update wechat doc
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @shijinpjlab, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request marks a significant architectural and functional upgrade, moving to version 2.0.0. The core evaluation engine has been redesigned for greater flexibility, allowing users to define complex evaluation pipelines with dynamic field mappings and a rich set of new LLM-based evaluators. This release also broadens data source compatibility and introduces an agent framework for advanced, tool-augmented evaluations, all while enhancing the user interface for better data visualization and interaction.

Highlights

  • Core Evaluation Architecture Refactor: The evaluation pipeline configuration has been completely overhauled, moving from a rigid object structure to a flexible array of evaluation pipelines (EvalPipline). This new structure allows for defining multiple evaluation groups, each with specific field mappings and a list of evaluators (rules or LLMs), significantly enhancing configurability and extensibility.
  • Introduction of Agent Framework: A new agent framework has been integrated, enabling LLM evaluators to perform multi-step reasoning and utilize external tools. This includes a BaseAgent class and specific implementations like AgentHallucination which can perform web searches for fact-checking, making evaluations more dynamic and robust.
  • Expanded Data Source Support: The local data source now supports a wider range of file formats including Excel (.xlsx, .xls), CSV, and Parquet, alongside existing JSONL/JSON/TXT formats. Additionally, a new SQL data source has been introduced, allowing direct streaming from databases like PostgreSQL, MySQL, and SQLite.
  • Standardized Evaluation Result Format: A new EvalDetail Pydantic model has been introduced to standardize the output of all evaluators (both rules and LLMs). This model captures metric name, status (pass/fail), score, labels, and reasons, replacing the previous ModelRes and simplifying result processing and aggregation.
  • Comprehensive Suite of New LLM Evaluators: A vast array of new LLM-based evaluators has been added across various categories, including RAG evaluation (Answer Relevancy, Context Precision, Recall, Relevancy, Faithfulness), SFT instruction quality (Clarity, Task Difficulty), Meta-rater dimensions (Professionalism, Readability, Reasoning, Cleanliness), minor language detection, resume optimization, and VLM-based document parsing.
  • Enhanced UI and Gradio App Functionality: The Electron UI has been updated to dynamically display evaluation results, including new filtering capabilities by JSONL file paths and improved pie chart visualizations. The Gradio application has also been significantly refactored to support the new flexible evaluation pipeline configuration and dynamic field mapping.
Ignored Files
  • Ignored by pattern: .github/workflows/** (3)
    • .github/workflows/IntegrationTest.yml
    • .github/workflows/lint.yml
    • .github/workflows/metrics-validation.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and impressive update to version 2.0.0, fundamentally refactoring the configuration and evaluation pipeline. The move from a rigid eval_group system to a flexible evaluator pipeline is a major architectural improvement, allowing for more complex and multi-faceted evaluations. The introduction of the agent framework with tool support, SQL data sources, and expanded file format support (Excel, CSV, Parquet) greatly enhances the capabilities of Dingo. The codebase refactoring, especially moving prompts into their respective LLM classes and simplifying the model registry, improves maintainability and clarity. The updates to the Gradio app and the Electron-based GUI also provide a much better user experience. Overall, this is a very strong update that makes Dingo a much more powerful and flexible tool.

#!/usr/bin/env python3
"""检查所有Python文件是否可以成功编译和导入"""

import os
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The os module is imported but not used in this script. It can be safely removed to keep the imports clean.

Comment on lines +92 to 94
return Object.keys(data.type_ratio?.content||{}).some(key =>
key.startsWith(firstLevelType + '-')
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for hasSecondLevel and the corresponding getSecondLevelData function seems to be broken after the refactoring of the summary.json structure. The type_ratio now has a nested structure, and the keys are in the format QUALITY_BAD_COMPLETENESS.RuleColonEnd, using . as a separator instead of -. The current implementation still checks for key.startsWith(firstLevelType + '-'), which will likely never be true, breaking the drill-down functionality in the pie chart legend. This should be updated to correctly handle the new data structure and restore the drill-down feature, or the related code for drill-down should be removed if the feature is no longer intended to be supported.

@shijinpjlab shijinpjlab merged commit 176e192 into main Dec 25, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants