GitHub - lac-dcc/Makara: A suite of tools to evaluate performance counters in different architectures.

Goals

The MAKARA project (Multi-Architecture Knowledge Analysis and Report Assessment) tests how deeply Large Language Models (LLMs) understand the foundational principles of computer architecture.

The core goal is to determine if an LLM can accurately predict performance behavior across different hardware. We challenge an LLM with a program's source code and its performance counter metrics from an initial architecture (Architecture A). The central objective is to query the LLM to predict how those same performance counters would appear if the identical program were executed on a distinct target architecture (Architecture B).

This cross-domain performance projection serves as a crucial metric for assessing an LLM's true architectural reasoning, moving beyond pattern recognition to validate its grasp of hardware-software interaction.

Help Us Build a Global Performance Dataset!

We are building a comprehensive dataset of performance counters, and we need your help! This dataset will map programs to their performance metrics on specific hardware.

By contributing, you'll help create a robust, open resource for researchers worldwide. As a collaborator, you will receive full access to the entire dataset!

Contributing is Easy: We've created an automated script that handles everything:

It downloads and compiles our benchmark collection (Makara/Jotai).
It runs the benchmarks and efficiently collects performance counter values.
It captures system architecture details (CPU, caches, etc.).
It automatically packages all the results for easy submission.

Ready to contribute? Just run the script (see instructions below) and send your data via this Google Form. Running the script takes about five to six minutes on a standard machine.

Overview

The Makara Data Pipeline automates the process of compiling, executing, and profiling programs using perf, aggregating the results into a structured dataset. It supports distributed execution, making it suitable for large-scale performance analysis or research in compiler optimization and program behavior characterization.

Motivation

We can use the data in the Makara project in several ways. For instance, this data lets us build cost models that guide compiler optimization heuristics. They also lets us predict the performance of programs on some architecture, given that we know this performance on another. The figure below illustrates this trend, showing that the number of instructions executed on an Intel Core i7-1355U has a strong linear correlation with the number of instructions that runs on an AMD Ryzen 5 3500U:

Results and Dataset

You can browse and download the available datasets of different archies here:

Dependencies

Ensure the following dependencies are installed on all machines participating in data collection:

GCC – for compiling C/C++ programs
Python 3.x – for orchestration and automation
Linux perf – for collecting performance metrics

Installation

You can install the dependencies using your package manager. For Ubuntu/Debian-based systems:

sudo apt update
sudo apt install build-essential python3 linux-tools-common linux-tools-$(uname -r)

(If perf is not found after installation, you may need to create a symbolic link to /usr/bin/perf.)

You can test if perf is installed with

perf --version

Configuration

To allow perf to collect data without restrictions, please run the following one-time command:

sudo sysctl -w kernel.perf_event_paranoid=1

▶️ How to Run

To start the data collection process, simply execute:

python3 collect_data.py

Note: On a typical machine, such as an Intel i7 at 2.8 GHz, this script takes 5 to 6 minutes to run.

The Makara Data Pipeline will:

Compile all source programs;
Execute each one while recording performance statistics with perf;
Store the results in a structured results/ directory;
Automatically compress the collected data into a .zip archive for easy sharing.

Once the .zip file is created, please submit your results via the Makara - Results Submission form.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
ModifiedJotai		ModifiedJotai
assets/images		assets/images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
collect_data.py		collect_data.py
inject_code.py		inject_code.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Goals

Help Us Build a Global Performance Dataset!

Overview

Motivation

Results and Dataset

Dependencies

Installation

Configuration

▶️ How to Run

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

lac-dcc/Makara

Folders and files

Latest commit

History

Repository files navigation

Goals

Help Us Build a Global Performance Dataset!

Overview

Motivation

Results and Dataset

Dependencies

Installation

Configuration

▶️ How to Run

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages