lm-evaluation-harness

Here are 2 public repositories matching this topic...

burcgokden / lm-evaluation-harness-with-PLDR-LLM-kvg-cache

Fork of LM Evaluation Harness Suite for evaluating benchmarks in paper titled "PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference"

nlp natural-language-processing pytorch large-language-models llm lm-evaluation-harness pldr-llm

Updated Feb 25, 2025
Python

sangstar / comparator

Star

Evaluate models and compare their scores

benchmarking machine-learning ai evaluation evaluation-metrics lm-evaluation-harness lm-evaluation

Updated Nov 22, 2025
C++

Improve this page

Add a description, image, and links to the lm-evaluation-harness topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the lm-evaluation-harness topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly