-
Notifications
You must be signed in to change notification settings - Fork 63
Description
I've been thinking about how Evalite might best integrate with systems that already run their own evaluations. These apps tend to run their evaluations in some kind of script which spits out a report.
It occurs to me that Evalite does really three things for you:
- Handles bundling and running your files
- Registers 'evals' and runs them concurrently
- Provides a reporting page
But what about users who already have their own scripts and manage their own concurrency? Users, in other words, who already have a hand-rolled solution.
Option 1: evalite.report
It seems like then we could provide a solution that only does 1 and 3:
const results = await customHandRolledScript();
evalite.report({
suiteName: 'foo',
rows: results,
columns: [...],
score: 0.5,
})I think this would make Evalite a lot more attractive for lots of companies since it would be really just a UI on top of their logic. All they'd need to do would be to change their file names to .eval.ts and add a single function call.
Option 2: evalite.run()
Option one might not end up being feasible, but there is another simple option:
evalite.run('Eval Name', async () => {
return {
suiteName: 'foo',
rows: results,
columns: [...],
score: 0.5,
}
});This would allow us to do a bit of concurrency management with Vitest (wrapping the code inside evalite.run in an it.concurrent) while still being very minimal overhead to add.