-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Problem Description
I am experiencing a severe output collapse issue when performing PTQ quantization on the SEM (Spatial Enhanced Manipulation) decoder model using the Horizon Robotics toolchain. The quantized HBM
model produces only 3 unique values, while the original ONNX model generates 895 unique values.
Environment Information
- Hardware Platform: RDK S100
- Toolchain Version: Open Explorer (hb_compile)
- Python Environment: Python 3.10 + miniconda
- Model Architecture: SEM Robotwin (Encoder-Decoder separated structure)
- Affected Model: Decoder (encoder quantization works normally)
Reproduction Steps
- Model Export
Following the official tutorial at https://forum.d-robotics.cc/t/topic/32657 for ONNX export:
# Quantization-friendly modifications
data["joint_relative_pos"] = data["joint_relative_pos"].to(torch.int8)
timestep = timestep.to(torch.int16)
# Changed float("-inf") to -15
# Export ONNX
python3 onnx_scripts/export_onnx.py \
--config config_sem_robotwin.py \
--model /path/to/model \
--output_path /path/to/onnx \
--num_joint 14 \
--validate
- Calibration Data Preparation
Prepared 100 calibration samples following the tutorial, preprocessed and passed through the encoder to generate decoder input features:
# Calibration data shape validation
noisy_action: (1, 64, 14, 8) float32 range: [-3.005, 3.252]
image_feature: (1, 3, 400, 256) float32 range: [-4.295, 4.137]
robot_feature: (1, 14, 1, 256) float32 range: [-3.518, 3.075]
timestep: (1,) int16 value: [999]
joint_relative_pos: (1, 14, 14) int8 range: [0, 13]
- Attempted Quantization Configurations
I have tried multiple quantization configurations, all of which failed:
Configuration 1: Int16 Quantization (Recommended by Official Tutorial)
calibration_parameters:
cal_data_dir: ./onnx_cup_42k/calibration_data_dir/...
quant_config: {
"model_config": {
"all_node_type": "int16"
},Duplicate of #
"op_config": {
"Resize": {"qtype": "float16"}
}
}
compiler_parameters:
optimize_level: O2
compile_mode: latency
Result: Failed - 3 unique values
Issue: Input quantization completely saturated the int16 range [-32768, 32767], causing severe clipping
Configuration 2: Int8 Quantization with O0 Optimization
quant_config: {
"model_config": {
"all_node_type": "int8"
},
"op_config": {
"Resize": {"qtype": "float16"}
}
}
compiler_parameters:
optimize_level: O0 # Changed from O2 to O0 to avoid over-optimization
compile_mode: latency
Result: Failed - 3 unique values
Improvement: Input quantization range normal [-128, 127], no clipping, but output still collapsed
Configuration 3: Int8 with Layerwise Search and Bias Correction (Optimal Configuration)
quant_config: {
"model_config": {
"all_node_type": "int8",
"activation": {
"calibration_type": ["max", "kl"],
"max_percentile": [0.99995, 0.99999, 1.0],
"num_bin": [2048, 4096],
"asymmetric": [false, true]
},
"weight": {
"bias_correction": {
"num_sample": 10,
"metric": "cosine-similarity"
}
},
"layerwise_search": {
"metric": "cosine-similarity"
}
},
"op_config": {
"Resize": {"qtype": "float16"},
"MatMul": {"qtype": "float16"},
"Gemm": {"qtype": "float16"}
}
}
compiler_parameters:
optimize_level: O0
compile_mode: bandwidth # Changed to bandwidth mode, prioritizing accuracy
Result: Still failed - 3 unique values
Test Results Comparison
I created a detailed comparison test script to test both ONNX and HBM models using identical input data:
Test Output
======================================================================
TESTING ONNX MODEL
======================================================================
Output Analysis:
Shape: (1, 64, 14, 8)
Dtype: float32
Range: [-0.971276, 1.467759]
Unique position values: 895/896
Status: GOOD DIVERSITY
======================================================================
TESTING HBM MODEL (Int8 + Layerwise Search)
======================================================================
Input Information:
noisy_action: dtype=S8, scale=1.90911591e-02
image_feature: dtype=S8, scale=3.66429798e-02
robot_feature: dtype=S8, scale=3.07448395e-02
Quantizing inputs:
noisy_action (S8): range=[-128, 127] (no clipping)
image_feature (S8): range=[-117, 113] (no clipping)
robot_feature (S8): range=[-114, 100] (no clipping)
Running HBM inference...
Output Analysis:
Shape: (1, 64, 14, 8)
Dtype: float32
Range: [-6.037731, 1.848850]
Unique position values: 3/896
LOW DIVERSITY - All unique values:
[-2.934769, -2.449742, -0.26831594]
======================================================================
COMPARISON SUMMARY
======================================================================
Output Diversity:
ONNX unique values: 895/896
HBM unique values: 3/896
Diversity loss: 892 values
Numerical Difference:
Mean absolute error: 1.534658
Max absolute error: 6.541608
Median absolute error: 0.916659
Conclusion:
HBM model has SEVERE output collapse
Eliminated Issues
- Input data issue: Using identical calibration data, ONNX model outputs normally
- Quantization scale issue: Int8 quantization scale is reasonable, no clipping
- Compilation configuration issue: Tried O0/O1/O2, latency/bandwidth multiple combinations
- Calibration method issue: Tried max/kl multiple calibration methods and parameter combinations
- Output format issue: Output correctly configured as float32 (removed Dequantize via remove_node_type)
- Compilation log: No warnings or errors, compilation successful
Debug Information
HBM Model Input/Output Information
# hbm_runtime query results
Model: decoder_opt_resize
Inputs:
- noisy_action: [1, 64, 14, 8] INT8
- image_feature: [1, 3, 400, 256] INT8
- robot_feature: [1, 14, 1, 256] INT8
- timestep: [1] INT16
- joint_relative_pos: [1, 14, 14] INT8
Output:
- pred_action: [1, 64, 14, 8] FLOAT32
Quant type: NONE
Scale: []
Compilation Log Key Information
2025-10-23 21:54:10 hb_compile completes running
remove_node_type: Quantize;Cast;Softmax
pred_action output [1, 64, 14, 8] FLOAT32
No warnings or errors