Anurag461

Anurag461

Achievements

laude-institute/terminal-bench laude-institute/terminal-bench Public

A benchmark for LLMs on complicated tasks in the terminal

Python 1.6k 480
lm-evaluation-harness lm-evaluation-harness Public

Forked from EleutherAI/lm-evaluation-harness

A framework for few-shot evaluation of language models.

Python
harbor harbor Public

Forked from laude-institute/harbor

Harbor is a framework for running agent evaluations and creating and using RL environments.

Python
polybench-parsers polybench-parsers Public

Test output parsers for various programming languages and testing frameworks

Python