Abstract: Efficient transmission of 3D point cloud data is critical for advanced perception in centralized and decentralized multi-agent robotic systems, especially nowadays with the growing reliance on edge and cloud-based processing. However, the large and complex nature of point clouds creates challenges under bandwidth constraints and intermittent connectivity, often degrading system performance. We propose a deep compression framework based on semantic scene graphs. The method decomposes point clouds into semantically coherent patches and encodes them into compact latent representations with semantic-aware encoders conditioned by Feature-wise Linear Modulation (FiLM). A folding-based decoder, guided by latent features and graph node attributes, enables structurally accurate reconstruction. Experiments on the SemanticKITTI and nuScenes datasets show that the framework achieves state-of-the-art compression rates, reducing data size by up to 98% while preserving both structural and semantic fidelity. In addition, it supports downstream applications such as multi-robot pose graph optimization and map merging, achieving trajectory accuracy and map alignment comparable to those obtained with raw LiDAR scans.
- π‘ Introduction
- π Setup β Prerequisites β’ Installation β’ Environment Setup β’ Dataset Setup β’ Configuration
- π¦ Pre-trained Weights
- ποΈ Train β Generate Semantic Scene Graphs β’ Train the Autoencoder
- βοΈ Test β Running Inference β’ Encode-Only Workflow β’ Decode-Only Workflow
- π Citation
Tested with Python 3.8, should work with newer versions as long as you get the correct versions of PyTorch, torch-geometric, and Open3D.
- Python 3.8 or higher
- CUDA-compatible GPU (recommended for training)
- Clone the repository to your desired working directory:
git clone https://github.com/LTU-RAI/sga-dpcc.git
cd sga-dpccUsing a virtual environment is strongly recommended. You can use any virtual environment manager you prefer.
conda create -n sga-dpcc python=3.8
conda activate sga-dpcc
pip install -r requirements.txtpython -m venv sga-dpcc-env
source sga-dpcc-env/bin/activate
pip install -r requirements.txt- Download the SemanticKITTI dataset from the official website
- Extract the dataset to your desired directory
- Ensure the dataset follows this directory structure:
path_to_dataset/
SemanticKitti/
βββ semantic-kitti.yaml
βββ calib.txt
βββ sequences/
βββ 00/
β βββ poses/ # 4x4 transformation matrix
β β βββ 00.txt
β βββ velodyne/ # Point cloud files
β β βββ 000000.bin
β β βββ ...
β βββ labels/ # Semantic labels
β βββ 000000.label
β βββ ...
βββ 01/
β βββ ...
βββ ...
Note: Additional data structures (semantic scene graphs, etc.) will be generated in a later step during the training setup β see the Train section below.
-
Main Configuration: Update
config/config-latest.yaml- Modify all file paths to point to your corresponding directories
- Set the correct path to your SemanticKITTI dataset
- Adjust output directories as needed
-
Autoencoder Configuration: The
config/autoencoder.yamlfile controls:- Parameters for all compression layers
- Training and testing hyperparameters
- Model architecture settings
Important Note: In config-latest.yaml, the layer classes are defined based on SemanticKITTI label classes. Please note that layers 3 and 4 are reversed compared to the paper.
- Download the pre-trained weights from Google Drive
- Extract and place them in the
weights/checkpoints/directory - Ensure the directory structure matches the existing checkpoint folders
If you wish to just test with the pretrained weights, skip to Test section.
Before training, generate the semantic scene graph (SSG) for each scan that contains the point patches for each layer.
python modules/GenSSG.py --ssg_config /path/to/config-latest.yamlArguments:
--ssg_config- Path to your main config YAML file (required; defaults to built-in path)
This generates .pickle files in a graph/ subdirectory for each sequence, corresponding to each scan's SSG.
Train layer-specific autoencoders using the generated SSGs and point cloud patches.
python train/train.py --layer 1 --num_epochs 100 --batch_size 8Key Arguments:
--layer(required) - Layer to train:1,2,3, or4--batch_size- Batch size (default: 8)--num_epochs- Number of training epochs (default: 100)--num_workers- Workers for data loading (default: 8)--lr- Learning rate (default: 5e-4)--weight_decay- Weight decay for optimizer (default: 1e-6)--max_range- Max point cloud range in meters (default: 50.0)--config_path- Path to main config YAML--root- Root directory of SemanticKITTI dataset--save_path- Directory to save checkpoints--device- Device for training:cudaorcpu(default: auto-detected)--cuda_visible_devices- Comma-separated GPU IDs (e.g.,0,1,2,3, default:0)--checkpoint- Resume from checkpoint (path fragment, e.g.,20250105-12)
Example: Train layer 1 with custom paths
python train/train.py \
--layer 1 \
--num_epochs 100 \
--batch_size 16 \
--config_path /home/user/config-latest.yaml \
--root /home/user/SemanticKitti \
--save_path /home/user/checkpointsMulti-GPU Training
For distributed training across multiple GPUs, use torchrun:
torchrun --nproc_per_node=4 train/train.py --layer 1 --num_epochs 100 --cuda_visible_devices 0,1,2,3The script automatically detects and handles distributed setup. Checkpoints are saved with a timestamp in {save_path}/{YYYYMMDD-HH}/autoencoder_layer_{layer}.torch.
The test/predict.py script performs point cloud compression inference using the pre-trained autoencoder models. It encodes point clouds into compact semantic scene graphs + the latents and reconstructs them, computing compression metrics.
cd test
python predict.pyThis runs inference with default parameters on sequence 00, scan 000000.
| Argument | Type | Default | Description |
|---|---|---|---|
--root |
str | ~/Documents/datasets_2/SemanticKitti |
Root directory of SemanticKITTI dataset |
--config_path |
str | ~/python_projects/sga-dpcc/config/config-latest.yaml |
Path to main configuration file |
--selected_layers |
int(s) | 1 2 3 4 |
Layer indices to process (space-separated) |
--max_range |
float | 50.0 |
Maximum range for point cloud (meters) |
--device |
str | cuda or cpu |
Device for inference (cuda or cpu) |
--sequence |
str | 00 |
Sequence number from dataset (e.g., 06) |
--scan_id |
str | 000000 |
Scan ID to process (e.g., 000100) |
Note: Latent dimensions are automatically loaded from config/autoencoder.yaml based on selected layers.
Process a specific sequence and scan:
python predict.py --sequence 06 --scan_id 000500Use only layers 3 and 4:
python predict.py --selected_layers 3 4Run on CPU:
python predict.py --device cpuThe script generates:
- predicted.pcd - Reconstructed point cloud with semantic colors
- gt.pcd - Ground truth point cloud for comparison
- Console metrics:
- Compressed size (MB)
- Original size (MB)
- Number of reconstructed points
- Bits per point (BPP)
- Compression ratio
- Size reduction percentage
- Chamfer distance (reconstructed vs. ground truth)
You can visualize the .pcd files using PCL tools or other point cloud viewers:
# Using PCL Viewer (if installed)
pcl_viewer predicted.pcd
# Using Open3D (Python)
python -c "import open3d as o3d; o3d.visualization.draw_geometries([o3d.io.read_point_cloud('predicted.pcd')])"For transmitting compressed data, use test/encode.py to encode a scan and save the compressed SSG with latents:
cd test
python encode.py --sequence 06 --scan_id 000500This creates 06_000500.pkl (compressed semantic scene graph with latent representations).
Key Options:
--output_bytes- Custom output path for encoded bytes (defaults to{sequence}_{scan_id}.pkl)
Reconstruct a point cloud from previously encoded bytes using test/decode.py:
cd test
python decode.py --sequence 06 --scan_id 000500 --input_bytes 06_000500.pklThis reads the encoded SSG and reconstructs 06_000500.pcd.
Key Options:
--input_bytes- Path to encoded bytes (defaults to{sequence}_{scan_id}.pkl)--output_pcd- Custom output path for PCD (defaults to{sequence}_{scan_id}.pcd)--skip_gt- Skip loading ground truth (for decoding without access to original dataset)--compute_chamfer- Compute Chamfer distance metric (requires ground truth)
If you found this work useful, please cite the following publication:
@article{stathoulopoulos2025sgadpcc,
author={Stathoulopoulos, Nikolaos and Kanellakis, Christoforos and Nikolakopoulos, George},
journal={IEEE Robotics and Automation Letters},
title={{Have We Scene It All? Scene Graph-Aware Deep Point Cloud Compression}},
year={2025},
volume={10},
number={12},
pages={12477-12484},
doi={10.1109/LRA.2025.3623045}
}


