A high-performance Mixed Boolean-Arithmetic (MBA) expression synthesis tool that generates formulas satisfying given input-output examples. The project provides three implementations with increasing levels of parallelization: sequential, multi-threaded CPU (OpenMP), and GPU (CUDA).
This tool performs program synthesis by enumerating and testing formulas of increasing complexity until finding one that satisfies all provided input-output examples. It supports both arithmetic operations (addition, subtraction, multiplication, division, modulo) and bitwise operations (AND, OR, XOR, NOT, shift operations).
.
├── seq/ # Sequential implementation
│ ├── main.cpp # Sequential formula generation
│ ├── formula.cpp # Formula evaluation and representation
│ ├── formula.h # Formula class definitions
│ ├── read.cpp # JSON input parsing
│ └── read.h # Input reading utilities
│
├── parallelcpu/ # Parallel CPU implementation (OpenMP)
│ ├── main.cpp # Parallel formula generation with OpenMP
│ ├── formula.cpp # Formula evaluation
│ ├── formula.h # Formula class definitions
│ ├── read.cpp # JSON input parsing
│ └── read.h # Input reading utilities
│
└── gpu/ # GPU implementation (CUDA)
├── mba.cu # Main CUDA implementation
├── mba_alt.cu # Alternative CUDA implementation
├── mba_rpn.cu # Reverse Polish Notation version
├── mbalight.cu # Lightweight CUDA version
├── mbalight_alt.cu # Alternative lightweight version
├── json.hpp # JSON library (nlohmann/json)
└── modified_libraries/ # External dependencies
├── helpers/ # CUDA helper utilities
└── warpcore/ # Hash table implementation
Unary Operations:
~(NOT) - Bitwise NOT-(NEG) - Negation (GPU only)
Binary Operations:
+(Plus) - Addition-(Minus) - Subtraction*(Mult) - Multiplication/(Div) - Division (GPU only)%(Mod) - Modulo (GPU only)&(And) - Bitwise AND|(Or) - Bitwise OR^(Xor) - Bitwise XOR<<(LShift) - Left shift (GPU only)>>(RShift) - Right shift (GPU only)
- Sequential (
seq/): Basic enumeration-based synthesis for baseline comparison - Parallel CPU (
parallelcpu/): OpenMP-accelerated parallel exploration - GPU (
gpu/): CUDA-accelerated with hash-based deduplication using warpcore hash sets
The tool reads benchmark files in JSON format containing input-output examples:
{
"initial": {
"inputs": {
"0": {"value": "0x12345678", "size": 32}
},
"outputs": {
"0": {"value": "0xabcdef00", "size": 32}
}
},
"sampling": [
{
"inputs": [
{"value": "0xaaaaaaaa", "size": 32}
],
"outputs": {
"0": {"value": "0xbbbbbbbb", "size": 32}
}
}
]
}Place benchmark files in a benchmarks/ directory (not tracked in git).
cd seq
g++ -std=c++17 -O3 main.cpp formula.cpp read.cpp -o mainRequires OpenMP support:
cd parallelcpu
g++ -std=c++17 -O3 -fopenmp main.cpp formula.cpp read.cpp -o mainRequires NVIDIA CUDA Toolkit (tested with CUDA 11.0+):
cd gpu
nvcc -std=c++17 -O3 mba.cu -o mba
# Or for alternative versions:
nvcc -std=c++17 -O3 mba_alt.cu -o mba_alt
nvcc -std=c++17 -O3 mba_rpn.cu -o mba_rpn
nvcc -std=c++17 -O3 mbalight.cu -o mbalightcd seq # or parallelcpu
./mainThe program will:
- Read examples from
../benchmarks/test.json - Generate and test formulas of increasing size (up to size 10-11)
- Print the first matching formula
- Display execution time
cd gpu
./mbaThe GPU version uses:
- Parallel formula generation across CUDA threads
- Hash-based deduplication to avoid testing duplicate formulas
- Constant memory for output samples (limited to 100 samples)
The synthesis algorithm follows an enumerative approach:
- Initialization: Start with variable formulas (size 1)
- Generation: For each size n:
- Apply unary operators to formulas of size n-1
- Combine formulas of sizes k and n-k-1 with binary operators (1 ≤ k ≤ n-2)
- Testing: Evaluate each generated formula against all examples
- Termination: Stop when a formula matching all examples is found
The GPU version optimizes this by:
- Storing formulas as evaluation results (Characteristic Sets)
- Using hash sets to deduplicate equivalent formulas
- Parallelizing across multiple CUDA blocks/threads
- Sequential: Baseline reference, explores formulas systematically
- Parallel CPU: Leverages multi-core CPUs with OpenMP parallelization
- GPU: Massive parallelization with thousands of threads, optimal for large search spaces
The GPU version includes several optimizations:
- Memory coalescing for efficient global memory access
- Hash-based deduplication to reduce redundant work
- Atomic operations for safe concurrent updates
- Constant memory for frequently accessed data
- C++17 compiler (GCC/Clang)
- OpenMP (for parallelcpu)
- NVIDIA CUDA Toolkit (for gpu, version 11.0+)
- nlohmann/json library (included in
json.hpp) - warpcore hash table library (included in
modified_libraries/warpcore/) - CUDA helpers (included in
modified_libraries/helpers/)
- Maximum number of samples: 100 (GPU version)
- Formula sizes tested: up to 10-11 nodes
- Measurement mode can be enabled by defining
MEASUREMENT_MODEin GPU code