This repository contains the artifact for the paper "Thinking Fast and Correct: Automated Rewriting of Numerical Code through Compiler Augmentation" (CGO 2026) by Siyuan Brant Qian, Vimarsh Sathia, Ivan R. Ivanov, Jan Hückelheim, Paul Hovland, and William S. Moses.
The latest version of this artifact is available here.
- Build From Source
- Docker Image (Recommended)
- Reproducing Main Results
- Reusing This Artifact
- Miscellaneous
sudo apt install build-essential cmake ninja-build libmpfr-dev
pip install lit numpy matplotlib tqdmAdditionally, install Racket and Rust.
git clone https://github.com/PRONTOLab/Poseidon.git
cd Poseidon
git submodule update --init --recursive llvm-project Enzymecd llvm-project
mkdir build && cd build
cmake -G Ninja \
-DLLVM_ENABLE_PROJECTS="clang" \
-DLLVM_ENABLE_LLD=ON \
-DLLVM_TARGETS_TO_BUILD="X86" \
-DCMAKE_BUILD_TYPE=Release \
../llvm
ninja
cd ../..cd Enzyme
mkdir build && cd build
cmake -G Ninja ../enzyme/ \
-DLLVM_DIR=<...>/Poseidon/llvm-project/build/lib/cmake/llvm \
-DLLVM_EXTERNAL_LIT=$(which lit) \
-DCMAKE_BUILD_TYPE=Release \
-DENABLE_POSEIDON=ON \
-DCMAKE_C_COMPILER=<...>/Poseidon/llvm-project/build/bin/clang \
-DCMAKE_CXX_COMPILER=<...>/Poseidon/llvm-project/build/bin/clang++
ninja
cd ../..Replace <...> with the path to your Poseidon clone.
We provide a pre-built Docker image sbrantq/poseidon. To start the container:
sudo docker run -it sbrantq/poseidon:latest /bin/bashTo copy results from the container to the host machine:
sudo docker ps -a # Find container ID
sudo docker cp <container_id>:/root/Poseidon/<path_to_file> <host_destination>Note: For convenience, the container ships cached outputs from external tools: Herbie (eig/cache-* and lulesh/cache) and RAPTOR (dquat/dquat_gold.txt, eig/eig_gold.txt). They are auxiliary and not Poseidon's results. One can remove eig/cache-* and lulesh/cache to rerun from scratch, but recomputation can take several hours.
Note: Results are hardware-dependent. The Docker image includes a cost model generated on our machine (AMD Ryzen Threadripper PRO 7995WX). For best performance and reproducibility on different hardware, please try to build from source and regenerate the cost model.
cd $HOME/Poseidon/FPBench/ablations
python3 ablation.pyThe plot will be saved to plots/fptaylor-extra-ex11-ablation.png.
To copy the plot to the host machine:
sudo docker cp <container_id>:/root/Poseidon/FPBench/ablations/plots/fptaylor-extra-ex11-ablation.png .cd $HOME/Poseidon/dquat
python3 run_ablation.pyThe plot will be saved to dquat.png.
To copy the plot to the host machine:
sudo docker cp <container_id>:/root/Poseidon/dquat/dquat.png .cd $HOME/Poseidon/lulesh
python3 ablation.pyThe plot will be saved to lulesh.png.
To copy the plot to your host machine:
sudo docker cp <container_id>:/root/Poseidon/lulesh/lulesh.png .Note: The 0-ULP result is hardware-dependent. To find the optimal configuration for your hardware, first regenerate the cost model, then run:
cd $HOME/Poseidon/lulesh
make && python3 run.py
python3 benchmark.py --sample-percent 10This samples 10% of the optimized programs and prints a summary of all budgets achieving a ULP of less than 5 (configurable via --ulp-threshold). The best budget reported by the script will result in the optimal rewrites for the user's hardware and should be used in lulesh/Makefile.
cd $HOME/Poseidon/eig
python3 run_cases.pyTo copy results to the host machine:
sudo docker cp <container_id>:/root/Poseidon/eig/biased.txt .
sudo docker cp <container_id>:/root/Poseidon/eig/equal.txt .TABLE I entries can be found in these output files. See eig/README.md for details on how to interpret the output.
This section describes how to reuse this artifact to apply Poseidon to a new benchmark.
Configure the paths to custom builds of LLVM and Enzyme:
export CLANG_PATH=<...>/llvm-project/build/bin
export ENZYME_PATH=<...>/Enzyme/build/Enzyme/ClangEnzyme-X.so
export PROFILER_PATH=<...>/Enzyme/build/EnzymeFirst, compile your program with floating-point profiling enabled to collect runtime information:
$CLANG_PATH/clang++ -O3 -ffast-math -march=native \
-fplugin=$ENZYME_PATH \
-mllvm --fpprofile-generate \
-L $PROFILER_PATH -lEnzymeFPProfile \
your_program.cc -o your_program_profRun the profiled executable with (potentially, small surrogate) inputs to generate floating-point profiles:
./your_program_prof <your_arguments>This creates an fpprofile directory.
Now compile with Poseidon's optimization pass enabled:
$CLANG_PATH/clang++ -O3 -ffast-math -march=native \
-fplugin=$ENZYME_PATH \
-mllvm --fpprofile-use=./fpprofile \
-mllvm --fpopt-cost-model-path=$HOME/Poseidon/cost-model/cm.csv \
your_program.cc -o your_program_optThis produces an optimized program (your_program_opt) that attempts to improve numerical accuracy while preserving performance.
The first run invokes external tool (e.g., Herbie) calls and performs a full dynamic-programming solve, with results cached (in the cache directory by default). Subsequent runs reuse these cached results to reduce execution time.
The first compilation generates cache/budgets.txt containing all achievable cost budgets from the dynamic-programming solve. To explore other performance/accuracy trade-offs:
-
Compile with different budgets: Recompile with varying
--fpopt-comp-cost-budgetvalues fromcache/budgets.txt. Each budget produces a differently optimized binary. -
Benchmark: Run each binary and compare outputs against a reference (e.g., the original program) to evaluate its performance and accuracy.
Please see lulesh/run.py and lulesh/benchmark.py for an example of automating this process.
The cost model (cost-model/cm.csv) is hardware-specific. To regenerate it for your machine:
cd $HOME/Poseidon/cost-model
python3 microbm.py
cp results.csv cm.csvOne can apply Poseidon to all FPBench programs and see statistics by
cd $HOME/Poseidon/FPBench/experiments
python3 run-all.py
python3 run.py --analyticsThis will display maximum speedups for each error threshold and accuracy improvement statistics.
The following commands produce all optimized LULESH programs and perform the full performance/accuracy measurements:
cd $HOME/Poseidon/lulesh
make
python3 run.py
python3 benchmark.pyThese can take several hours and is not required to reproduce Figure 13.
| Flag | Description |
|---|---|
--fpprofile-use=<path> |
Path to the generated FP profile directory |
--fpopt-enable-herbie |
Enable Herbie for expression rewriting |
--fpopt-enable-pt |
Enable precision tuning |
--fpopt-comp-cost-budget=<N> |
Cost budget for the optimization pass; overrides enzyme_err_tol annotation |
--fpopt-num-samples=<N> |
Number of samples for accuracy estimation |
--fpopt-early-prune |
Enable optional pruning steps in the solver |
--herbie-num-threads=<N> |
Number of threads for Herbie |
--fpopt-cache-path=<path> |
Directory to cached results (default: cache) |
--fpopt-cost-model-path=<path> |
Path to the cost model CSV |