Quesma
Making AI agents production-ready through independent evaluation and training.
Pinned Loading
Repositories
Showing 10 of 21 repositories
- terminal-bench Public Forked from harbor-framework/terminal-bench
A benchmark for LLMs on complicated tasks in the terminal
QuesmaOrg/terminal-bench’s past year of commit activity - BinaryAudit Public
An open-source benchmark for evaluating AI agents' ability to find backdoors hidden in compiled binaries.
QuesmaOrg/BinaryAudit’s past year of commit activity - terminal-bench-science Public Forked from harbor-framework/terminal-bench-science
Terminal Bench for Science
QuesmaOrg/terminal-bench-science’s past year of commit activity
Top languages
Loading…
Most used topics
Loading…