Skip to content

Add evals support #120

@rohan-at-sentry

Description

@rohan-at-sentry

Product Context

Evals are increasingly becoming the preferred way for developers to test the validity of their LLM. Imagine if we could wire up evals to CI - bringing the same rapid response as test failures within a PR context.

How would we do it

Sentry Vitest Evals {getsentry/vitest-evals#13 } will output evals results. The result would contain a score (see block)


 "meta": {
                        "eval": {
                            "scores": [
                                {
                                    "score": 0.6,
                                    "metadata": {
                                        "rationale": "The submitted answer is a superset of the expert answer and is fully consistent with it. The expert answer identifies the root cause as a mismatch in the bottle ID passed to the `bottleById` function, which results in a 'Bottle not found' error. The submitted answer includes this same root cause but provides additional details, such as the specific IDs involved (3216 and 16720), and offers a comprehensive proposed solution and implementation strategy. This includes steps to inspect and correct the client-side code, verify parameter mapping, and test the fix. Therefore, the submission expands upon the expert's analysis without contradicting it."
                                    },
                                    "name": "Factuality2"
                                }
                            ],
                            "avgScore": 0.6
                        }
                    }

Display score metadata and name on the PR (along with test results) for every commit that uploads test results + eval

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions