Skip to content

ahmedalalousi/Extended-ROSS

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

3D NoC ROSS Extension

Overview

This extension implements the 3D Network-on-Chip (NoC) simulation framework described in the paper "Vendor-Independent Design Space Exploration and Resource Optimisation Framework for 3D Networks-on-Chip Using Hypergraph-Genetic Algorithm Integration" (Al-Alousi et al., IEEE TCAD 2024).

The implementation specifically covers:

  • Section VI.B: Event-Driven Architecture with Time Warp optimistic synchronization
  • Section VI.C: Event-driven NoC modelling with router LPs and traffic generators
  • Section VI.D: Framework-specific ROSS extensions for 3D topologies

Features

Supported Topologies

  • 3D Mesh: Regular orthogonal connections between adjacent nodes
  • 3D Torus: Mesh with wrap-around connections for reduced network diameter
  • 3D Hypercube: Binary address transitions between nodes

Routing Algorithms

  • XYZ Dimension-Order Routing
  • Adaptive XYZ Routing (congestion-aware)
  • Minimal Path Routing
  • Dijkstra-based Routing (planned)

Workload Generators

  • SHA256 Cryptographic: Burst traffic pattern for hash operations
  • ML Training: All-reduce pattern for distributed neural network training
  • IoT Gateway: Many small messages from sensors
  • Database Operations: Query-response patterns

Power and Thermal Modelling

  • Static and dynamic power consumption tracking
  • Junction temperature calculation
  • Thermal throttling when temperature exceeds threshold
  • Power state management (Performance/Balanced/Low-Power/Dynamic)

Performance Metrics (PCR Functions)

  • Latency PCR
  • Throughput PCR
  • Bandwidth PCR
  • Power PCR
  • Thermal PCR

Dynamic Adaptation

  • Workload analysis and classification
  • Weight adaptation based on performance sensitivities
  • Exponential smoothing for stable adaptation

Installation

Prerequisites

Building the Extension

  1. Clone ROSS and set up the repository:
git clone https://github.com/ROSS-org/ROSS.git
cd ROSS
  1. Create the noc-3d model directory:
mkdir models/noc-3d
cd models/noc-3d
  1. Copy the extension files:
  • noc_3d.h - Core header file
  • noc_3d.c - Implementation file
  • noc_3d_main.c - Main simulation runner
  • CMakeLists.txt - Build configuration
  1. Create a symlink (if not in models directory):
cd ../..  # Back to ROSS root
ln -s ~/path-to/noc-3d models/noc-3d
  1. Configure and build:
mkdir build
cd build
cmake .. -DROSS_BUILD_MODELS=ON \
         -DCMAKE_C_COMPILER=mpicc \
         -DCMAKE_BUILD_TYPE=Release
make noc-3d

Running Simulations

Basic Usage

# Sequential mode (for debugging)
./models/noc-3d/noc-3d --synch=1 --noc-x=8 --noc-y=8 --noc-z=8

# Parallel optimistic mode
mpirun -np 4 ./models/noc-3d/noc-3d --synch=3 \
    --noc-topology=0 \    # 0=mesh, 1=torus, 2=hypercube
    --workload=0 \         # 0=SHA256, 1=ML, 2=IoT, 3=DB
    --end-time=100000

Command Line Options

NoC Configuration

  • --noc-x: X dimension size (default: 8)
  • --noc-y: Y dimension size (default: 8)
  • --noc-z: Z dimension size (default: 8)
  • --noc-topology: Topology type (0=mesh, 1=torus, 2=hypercube)
  • --noc-routing: Routing algorithm (0=XYZ, 1=adaptive)
  • --noc-buffers: Buffer depth per port (default: 8)
  • --noc-vcs: Virtual channels per port (default: 4)
  • --noc-freq: Clock frequency in GHz (default: 1.0)

Simulation Parameters

  • --workload: Workload type (0=SHA256, 1=ML, 2=IoT, 3=Database)
  • --generators: Number of traffic generators (default: 16)
  • --end-time: Simulation end time (default: 100000)
  • --stats-interval: Statistics collection interval (default: 1000)
  • --output: Output JSON file (default: noc_3d_results.json)

Example Experiments

Compare Topologies

# Run mesh topology
mpirun -np 4 ./noc-3d --synch=3 --noc-topology=0 --workload=0 --output=mesh_results.json

# Run torus topology
mpirun -np 4 ./noc-3d --synch=3 --noc-topology=1 --workload=0 --output=torus_results.json

# Run hypercube topology
mpirun -np 4 ./noc-3d --synch=3 --noc-topology=2 --workload=0 --output=hypercube_results.json

Evaluate Different Workloads

# SHA256 cryptographic workload
mpirun -np 4 ./noc-3d --synch=3 --workload=0 --output=sha256_results.json

# ML training workload
mpirun -np 4 ./noc-3d --synch=3 --workload=1 --output=ml_results.json

# IoT gateway workload
mpirun -np 4 ./noc-3d --synch=3 --workload=2 --output=iot_results.json

# Database workload
mpirun -np 4 ./noc-3d --synch=3 --workload=3 --output=db_results.json

Scaling Study

# 4x4x4 network (64 nodes)
mpirun -np 2 ./noc-3d --synch=3 --noc-x=4 --noc-y=4 --noc-z=4

# 8x8x8 network (512 nodes)
mpirun -np 8 ./noc-3d --synch=3 --noc-x=8 --noc-y=8 --noc-z=8

# 16x16x16 network (4096 nodes)
mpirun -np 16 ./noc-3d --synch=3 --noc-x=16 --noc-y=16 --noc-z=16

Output Format

The simulation produces a JSON output file with the following structure:

{
  "configuration": {
    "topology": 0,
    "dimensions": [8, 8, 8],
    "total_nodes": 512,
    "routing": 0,
    "buffer_depth": 8,
    "virtual_channels": 4,
    "frequency_ghz": 1.0,
    "workload": 0
  },
  "performance": {
    "avg_latency_ns": 12.8,
    "avg_throughput_gbps": 645.0,
    "network_utilization": 0.658,
    "buffer_utilization": 0.713
  },
  "power": {
    "total_power_mw": 1590.0,
    "peak_temperature_c": 72.3
  },
  "topology_comparison": {
    "mesh_latency_ns": 15.2,
    "torus_latency_ns": 13.5,
    "hypercube_latency_ns": 12.8
  },
  "workload_specific": {
    "sha256_hash_rate": 125000.0
  }
}

Implementation Details

Event-Driven Architecture

The implementation uses ROSS's Time Warp optimistic synchronization protocol with the following event types:

  • EVENT_FLIT_ARRIVAL: Flit arrives at router
  • EVENT_FLIT_DEPARTURE: Flit departs from router
  • EVENT_BUFFER_REQUEST: Request for buffer allocation
  • EVENT_BUFFER_RELEASE: Release allocated buffer
  • EVENT_POWER_STATE_CHANGE: Change router power state
  • EVENT_THERMAL_UPDATE: Periodic thermal calculation
  • EVENT_GENERATE_BURST: Generate traffic burst
  • EVENT_TSV_TRANSFER: Vertical transfer through TSV

Logical Process Types

  1. Router LPs: Model individual NoC routers

    • Maintain buffer state
    • Route flits based on algorithm
    • Track power and thermal state
    • Calculate PCR metrics
  2. Traffic Generator LPs: Generate workload-specific traffic

    • SHA256: Burst pattern with fixed block size
    • ML: All-reduce communication pattern
    • IoT: Many small sensor messages
    • Database: Query-response patterns

TSV Modelling

Through-Silicon Vias (TSVs) for vertical connections are modeled with:

  • Higher latency (2x horizontal links)
  • Additional capacitance (50 fF)
  • Resistance (1 Ohm)
  • Separate power consumption calculation

Dynamic Adaptation

The framework implements Algorithm 3 from the paper:

  1. Analyse workload characteristics
  2. Calculate performance sensitivities
  3. Update weights based on workload type
  4. Apply exponential smoothing for stability

Validation

The implementation has been validated against:

  • Published silicon implementations (Intel TeraFLOPS, MIT RAW)
  • Analytical models from the paper
  • Expected scaling characteristics

Results show:

  • 92-96% correlation with silicon implementations
  • Correct scaling behaviour for different topologies
  • Expected performance improvements for optimised configurations

Troubleshooting

Common Issues

  1. Memory allocation errors: Increase event memory with --extramem=100000
  2. GVT computation frequency: Adjust with --gvt-interval=1000
  3. Load imbalance: Ensure proper LP-to-PE mapping
  4. Thermal throttling: Monitor temperature with debug output

Debug Options

# Run in optimistic debug mode (serial execution with rollback)
./noc-3d --synch=4 --noc-x=4 --noc-y=4 --noc-z=4

# Enable verbose output
./noc-3d --synch=3 --verbose=1

# Generate event trace
./noc-3d --synch=3 --stats=1

Future Enhancements

Planned improvements include:

  • Chiplet-based architectures
  • 2.5D integration support
  • ML-based runtime prediction
  • Additional routing algorithms
  • Fault tolerance mechanisms
  • Integration with GA optimisation framework

References

  • Al-Alousi, A., Li, M., & Meng, H. (2024). "Vendor-Independent Design Space Exploration and Resource Optimisation Framework for 3D Networks-on-Chip Using Hypergraph-Genetic Algorithm Integration." IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

  • ROSS Documentation: https://ross-org.github.io/

License

This extension follows the same license as ROSS (see ROSS repository for details).

Contact

For questions or issues, please open an issue on the ROSS GitHub repository with the tag [3D-NoC].

About

Rensselaer's Optimistic Simulation System

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 87.5%
  • CMake 10.3%
  • Shell 2.2%