SPHINX is a research project focused on transferring machine learning-based optimization predictors across MLIR dialects by representing programs as ProGraML graphs. This repository contains both the core C++ graph generation infrastructure and the Python-based machine learning experiments.
This monorepo is divided into two main components:
mlir-to-programl/(C++): The core tool that translates MLIR code into ProGraML graph representations.ml-experiments/(Python): Machine learning models, training scripts, and reproduction experiments.
Located in mlir-to-programl/. This tool reads MLIR files and outputs the corresponding graphs.
Prerequisites:
- CMake >= 3.20
- C++20 Compiler (GCC 10+ or Clang 10+)
- LLVM/MLIR (Installed and configured)
- Google Protobuf
- Google Abseil (Abseil-cpp)
Build Instructions:
-
Configure LLVM Path: The build system expects to find your LLVM installation. By default, it looks in
$HOME/llvm_install. If your LLVM is installed elsewhere, export the path before building:export CMAKE_PREFIX_PATH=/path/to/your/llvm/lib/cmake:$CMAKE_PREFIX_PATH
-
Build:
cd mlir-to-programl mkdir build && cd build cmake .. make -j8
Usage:
-
Single File Mode: Converts a single MLIR file. If the output path is omitted, it defaults to replacing the extension with
.ProgramGraph.pb../mlir-to-programl <input.mlir> [output.ProgramGraph.pb]
-
Dataset Mode: Processes an entire directory. It detects if the input is a folder and automatically converts all contained MLIR files.
./mlir-to-programl <dataset_folder>
Located in ml-experiments/. Contains GNN models and training/evaluation scripts.
Prerequisites:
- Python 3.8+
- CUDA (optional but recommended for training)
Setup:
We provide a unified environment for all experiments:
cd ml-experiments
# 1. Create a virtual environment
python3 -m venv venv
source venv/bin/activate
# 2. Install dependencies (PyTorch, PyG, etc.)
pip install -r requirements.txt