This extension implements the 3D Network-on-Chip (NoC) simulation framework described in the paper "Vendor-Independent Design Space Exploration and Resource Optimisation Framework for 3D Networks-on-Chip Using Hypergraph-Genetic Algorithm Integration" (Al-Alousi et al., IEEE TCAD 2024).
The implementation specifically covers:
- Section VI.B: Event-Driven Architecture with Time Warp optimistic synchronization
- Section VI.C: Event-driven NoC modelling with router LPs and traffic generators
- Section VI.D: Framework-specific ROSS extensions for 3D topologies
- 3D Mesh: Regular orthogonal connections between adjacent nodes
- 3D Torus: Mesh with wrap-around connections for reduced network diameter
- 3D Hypercube: Binary address transitions between nodes
- XYZ Dimension-Order Routing
- Adaptive XYZ Routing (congestion-aware)
- Minimal Path Routing
- Dijkstra-based Routing (planned)
- SHA256 Cryptographic: Burst traffic pattern for hash operations
- ML Training: All-reduce pattern for distributed neural network training
- IoT Gateway: Many small messages from sensors
- Database Operations: Query-response patterns
- Static and dynamic power consumption tracking
- Junction temperature calculation
- Thermal throttling when temperature exceeds threshold
- Power state management (Performance/Balanced/Low-Power/Dynamic)
- Latency PCR
- Throughput PCR
- Bandwidth PCR
- Power PCR
- Thermal PCR
- Workload analysis and classification
- Weight adaptation based on performance sensitivities
- Exponential smoothing for stable adaptation
- ROSS simulator (latest version from https://github.com/ROSS-org/ROSS)
- CMake 3.5 or higher
- MPI implementation (MPICH recommended)
- C11 compiler (gcc or clang)
- Clone ROSS and set up the repository:
git clone https://github.com/ROSS-org/ROSS.git
cd ROSS- Create the noc-3d model directory:
mkdir models/noc-3d
cd models/noc-3d- Copy the extension files:
noc_3d.h- Core header filenoc_3d.c- Implementation filenoc_3d_main.c- Main simulation runnerCMakeLists.txt- Build configuration
- Create a symlink (if not in models directory):
cd ../.. # Back to ROSS root
ln -s ~/path-to/noc-3d models/noc-3d- Configure and build:
mkdir build
cd build
cmake .. -DROSS_BUILD_MODELS=ON \
-DCMAKE_C_COMPILER=mpicc \
-DCMAKE_BUILD_TYPE=Release
make noc-3d# Sequential mode (for debugging)
./models/noc-3d/noc-3d --synch=1 --noc-x=8 --noc-y=8 --noc-z=8
# Parallel optimistic mode
mpirun -np 4 ./models/noc-3d/noc-3d --synch=3 \
--noc-topology=0 \ # 0=mesh, 1=torus, 2=hypercube
--workload=0 \ # 0=SHA256, 1=ML, 2=IoT, 3=DB
--end-time=100000--noc-x: X dimension size (default: 8)--noc-y: Y dimension size (default: 8)--noc-z: Z dimension size (default: 8)--noc-topology: Topology type (0=mesh, 1=torus, 2=hypercube)--noc-routing: Routing algorithm (0=XYZ, 1=adaptive)--noc-buffers: Buffer depth per port (default: 8)--noc-vcs: Virtual channels per port (default: 4)--noc-freq: Clock frequency in GHz (default: 1.0)
--workload: Workload type (0=SHA256, 1=ML, 2=IoT, 3=Database)--generators: Number of traffic generators (default: 16)--end-time: Simulation end time (default: 100000)--stats-interval: Statistics collection interval (default: 1000)--output: Output JSON file (default: noc_3d_results.json)
# Run mesh topology
mpirun -np 4 ./noc-3d --synch=3 --noc-topology=0 --workload=0 --output=mesh_results.json
# Run torus topology
mpirun -np 4 ./noc-3d --synch=3 --noc-topology=1 --workload=0 --output=torus_results.json
# Run hypercube topology
mpirun -np 4 ./noc-3d --synch=3 --noc-topology=2 --workload=0 --output=hypercube_results.json# SHA256 cryptographic workload
mpirun -np 4 ./noc-3d --synch=3 --workload=0 --output=sha256_results.json
# ML training workload
mpirun -np 4 ./noc-3d --synch=3 --workload=1 --output=ml_results.json
# IoT gateway workload
mpirun -np 4 ./noc-3d --synch=3 --workload=2 --output=iot_results.json
# Database workload
mpirun -np 4 ./noc-3d --synch=3 --workload=3 --output=db_results.json# 4x4x4 network (64 nodes)
mpirun -np 2 ./noc-3d --synch=3 --noc-x=4 --noc-y=4 --noc-z=4
# 8x8x8 network (512 nodes)
mpirun -np 8 ./noc-3d --synch=3 --noc-x=8 --noc-y=8 --noc-z=8
# 16x16x16 network (4096 nodes)
mpirun -np 16 ./noc-3d --synch=3 --noc-x=16 --noc-y=16 --noc-z=16The simulation produces a JSON output file with the following structure:
{
"configuration": {
"topology": 0,
"dimensions": [8, 8, 8],
"total_nodes": 512,
"routing": 0,
"buffer_depth": 8,
"virtual_channels": 4,
"frequency_ghz": 1.0,
"workload": 0
},
"performance": {
"avg_latency_ns": 12.8,
"avg_throughput_gbps": 645.0,
"network_utilization": 0.658,
"buffer_utilization": 0.713
},
"power": {
"total_power_mw": 1590.0,
"peak_temperature_c": 72.3
},
"topology_comparison": {
"mesh_latency_ns": 15.2,
"torus_latency_ns": 13.5,
"hypercube_latency_ns": 12.8
},
"workload_specific": {
"sha256_hash_rate": 125000.0
}
}The implementation uses ROSS's Time Warp optimistic synchronization protocol with the following event types:
EVENT_FLIT_ARRIVAL: Flit arrives at routerEVENT_FLIT_DEPARTURE: Flit departs from routerEVENT_BUFFER_REQUEST: Request for buffer allocationEVENT_BUFFER_RELEASE: Release allocated bufferEVENT_POWER_STATE_CHANGE: Change router power stateEVENT_THERMAL_UPDATE: Periodic thermal calculationEVENT_GENERATE_BURST: Generate traffic burstEVENT_TSV_TRANSFER: Vertical transfer through TSV
-
Router LPs: Model individual NoC routers
- Maintain buffer state
- Route flits based on algorithm
- Track power and thermal state
- Calculate PCR metrics
-
Traffic Generator LPs: Generate workload-specific traffic
- SHA256: Burst pattern with fixed block size
- ML: All-reduce communication pattern
- IoT: Many small sensor messages
- Database: Query-response patterns
Through-Silicon Vias (TSVs) for vertical connections are modeled with:
- Higher latency (2x horizontal links)
- Additional capacitance (50 fF)
- Resistance (1 Ohm)
- Separate power consumption calculation
The framework implements Algorithm 3 from the paper:
- Analyse workload characteristics
- Calculate performance sensitivities
- Update weights based on workload type
- Apply exponential smoothing for stability
The implementation has been validated against:
- Published silicon implementations (Intel TeraFLOPS, MIT RAW)
- Analytical models from the paper
- Expected scaling characteristics
Results show:
- 92-96% correlation with silicon implementations
- Correct scaling behaviour for different topologies
- Expected performance improvements for optimised configurations
- Memory allocation errors: Increase event memory with
--extramem=100000 - GVT computation frequency: Adjust with
--gvt-interval=1000 - Load imbalance: Ensure proper LP-to-PE mapping
- Thermal throttling: Monitor temperature with debug output
# Run in optimistic debug mode (serial execution with rollback)
./noc-3d --synch=4 --noc-x=4 --noc-y=4 --noc-z=4
# Enable verbose output
./noc-3d --synch=3 --verbose=1
# Generate event trace
./noc-3d --synch=3 --stats=1Planned improvements include:
- Chiplet-based architectures
- 2.5D integration support
- ML-based runtime prediction
- Additional routing algorithms
- Fault tolerance mechanisms
- Integration with GA optimisation framework
-
Al-Alousi, A., Li, M., & Meng, H. (2024). "Vendor-Independent Design Space Exploration and Resource Optimisation Framework for 3D Networks-on-Chip Using Hypergraph-Genetic Algorithm Integration." IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
-
ROSS Documentation: https://ross-org.github.io/
This extension follows the same license as ROSS (see ROSS repository for details).
For questions or issues, please open an issue on the ROSS GitHub repository with the tag [3D-NoC].