Skip to content
#

gsm8k

Here are 19 public repositories matching this topic...

An evaluation of prompting techniques (Zero-Shot CoT, Few-Shot, Self-Consistency) on the Mistral-7B model for mathematical reasoning. This project systematically benchmarks 7 distinct methods on the GSM8K dataset.

  • Updated Nov 2, 2025
  • Python

Developing an autonomous system for prompt selection for Large Language Models (LLMs), enhancing performance across tasks by balancing generality and specificity. This project automates diverse, high-quality prompt creation and selection, reducing manual intervention and maximizing LLM utility across applications.

  • Updated Dec 10, 2024
  • Jupyter Notebook

Dataset management library for ML experiments—loaders for SciFact, FEVER, GSM8K, HumanEval, MMLU, TruthfulQA, HellaSwag; git-like versioning with lineage tracking; transformation pipelines; quality validation with schema checks and duplicate detection; GenStage streaming for large datasets. Built for reproducible AI research.

  • Updated Dec 29, 2025
  • Elixir

Improve this page

Add a description, image, and links to the gsm8k topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the gsm8k topic, visit your repo's landing page and select "manage topics."

Learn more