Skip to content

evalite.report() for the simplest-possible integration story #308

@mattpocock

Description

@mattpocock

I've been thinking about how Evalite might best integrate with systems that already run their own evaluations. These apps tend to run their evaluations in some kind of script which spits out a report.

It occurs to me that Evalite does really three things for you:

  1. Handles bundling and running your files
  2. Registers 'evals' and runs them concurrently
  3. Provides a reporting page

But what about users who already have their own scripts and manage their own concurrency? Users, in other words, who already have a hand-rolled solution.

Option 1: evalite.report

It seems like then we could provide a solution that only does 1 and 3:

const results = await customHandRolledScript();

evalite.report({
  suiteName: 'foo',
  rows: results,
  columns: [...],
  score: 0.5,
})

I think this would make Evalite a lot more attractive for lots of companies since it would be really just a UI on top of their logic. All they'd need to do would be to change their file names to .eval.ts and add a single function call.

Option 2: evalite.run()

Option one might not end up being feasible, but there is another simple option:

evalite.run('Eval Name', async () => {
  return {
    suiteName: 'foo',
    rows: results,
    columns: [...],
    score: 0.5,
  }
});

This would allow us to do a bit of concurrency management with Vitest (wrapping the code inside evalite.run in an it.concurrent) while still being very minimal overhead to add.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions