SEM Decoder Model PTQ Quantization Results in Severe Output Collapse with Only 3 Unique Values

  Problem Description

  I am experiencing a severe output collapse issue when performing PTQ quantization on the SEM (Spatial Enhanced Manipulation) decoder model using the Horizon Robotics toolchain. The quantized HBM
  model produces only 3 unique values, while the original ONNX model generates 895 unique values.

  Environment Information

  - Hardware Platform: RDK S100
  - Toolchain Version: Open Explorer (hb_compile)
  - Python Environment: Python 3.10 + miniconda
  - Model Architecture: SEM Robotwin (Encoder-Decoder separated structure)
  - Affected Model: Decoder (encoder quantization works normally)

  Reproduction Steps

  1. Model Export

  Following the official tutorial at https://forum.d-robotics.cc/t/topic/32657 for ONNX export:

```
  # Quantization-friendly modifications
  data["joint_relative_pos"] = data["joint_relative_pos"].to(torch.int8)
  timestep = timestep.to(torch.int16)
  # Changed float("-inf") to -15

  # Export ONNX
  python3 onnx_scripts/export_onnx.py \
      --config config_sem_robotwin.py \
      --model /path/to/model \
      --output_path /path/to/onnx \
      --num_joint 14 \
      --validate
```

  2. Calibration Data Preparation

  Prepared 100 calibration samples following the tutorial, preprocessed and passed through the encoder to generate decoder input features:

 ```
 # Calibration data shape validation
  noisy_action:      (1, 64, 14, 8)   float32  range: [-3.005, 3.252]
  image_feature:     (1, 3, 400, 256) float32  range: [-4.295, 4.137]
  robot_feature:     (1, 14, 1, 256)  float32  range: [-3.518, 3.075]
  timestep:          (1,)              int16    value: [999]
  joint_relative_pos: (1, 14, 14)     int8     range: [0, 13]
```

  3. Attempted Quantization Configurations

  I have tried multiple quantization configurations, all of which failed:

  Configuration 1: Int16 Quantization (Recommended by Official Tutorial)

```
  calibration_parameters:
    cal_data_dir: ./onnx_cup_42k/calibration_data_dir/...
    quant_config: {
      "model_config": {
        "all_node_type": "int16"
      },Duplicate of #
      "op_config": {
        "Resize": {"qtype": "float16"}
      }
    }
  compiler_parameters:
    optimize_level: O2
    compile_mode: latency
```

  Result: Failed - 3 unique values

  Issue: Input quantization completely saturated the int16 range [-32768, 32767], causing severe clipping

  ---
  Configuration 2: Int8 Quantization with O0 Optimization
```

  quant_config: {
    "model_config": {
      "all_node_type": "int8"
    },
    "op_config": {
      "Resize": {"qtype": "float16"}
    }
  }
  compiler_parameters:
    optimize_level: O0  # Changed from O2 to O0 to avoid over-optimization
    compile_mode: latency
```

  Result: Failed - 3 unique values

  Improvement: Input quantization range normal [-128, 127], no clipping, but output still collapsed

  ---
  Configuration 3: Int8 with Layerwise Search and Bias Correction (Optimal Configuration)

```
  quant_config: {
    "model_config": {
      "all_node_type": "int8",
      "activation": {
        "calibration_type": ["max", "kl"],
        "max_percentile": [0.99995, 0.99999, 1.0],
        "num_bin": [2048, 4096],
        "asymmetric": [false, true]
      },
      "weight": {
        "bias_correction": {
          "num_sample": 10,
          "metric": "cosine-similarity"
        }
      },
      "layerwise_search": {
        "metric": "cosine-similarity"
      }
    },
    "op_config": {
      "Resize": {"qtype": "float16"},
      "MatMul": {"qtype": "float16"},
      "Gemm": {"qtype": "float16"}
    }
  }
  compiler_parameters:
    optimize_level: O0
    compile_mode: bandwidth  # Changed to bandwidth mode, prioritizing accuracy
```

  Result: Still failed - 3 unique values

  ---
  Test Results Comparison

  I created a detailed comparison test script to test both ONNX and HBM models using identical input data:

  Test Output

```
  ======================================================================
  TESTING ONNX MODEL
  ======================================================================
  Output Analysis:
     Shape: (1, 64, 14, 8)
     Dtype: float32
     Range: [-0.971276, 1.467759]
     Unique position values: 895/896
     Status: GOOD DIVERSITY

  ======================================================================
  TESTING HBM MODEL (Int8 + Layerwise Search)
  ======================================================================
  Input Information:
     noisy_action: dtype=S8, scale=1.90911591e-02
     image_feature: dtype=S8, scale=3.66429798e-02
     robot_feature: dtype=S8, scale=3.07448395e-02

  Quantizing inputs:
     noisy_action (S8): range=[-128, 127]  (no clipping)
     image_feature (S8): range=[-117, 113] (no clipping)
     robot_feature (S8): range=[-114, 100] (no clipping)

  Running HBM inference...
  Output Analysis:
     Shape: (1, 64, 14, 8)
     Dtype: float32
     Range: [-6.037731, 1.848850]
     Unique position values: 3/896
     LOW DIVERSITY - All unique values:
     [-2.934769, -2.449742, -0.26831594]

  ======================================================================
  COMPARISON SUMMARY
  ======================================================================
  Output Diversity:
     ONNX unique values: 895/896
     HBM  unique values: 3/896
     Diversity loss: 892 values

  Numerical Difference:
     Mean absolute error: 1.534658
     Max absolute error: 6.541608
     Median absolute error: 0.916659

  Conclusion:
     HBM model has SEVERE output collapse
```

  Eliminated Issues

  1. Input data issue: Using identical calibration data, ONNX model outputs normally
  2. Quantization scale issue: Int8 quantization scale is reasonable, no clipping
  3. Compilation configuration issue: Tried O0/O1/O2, latency/bandwidth multiple combinations
  4. Calibration method issue: Tried max/kl multiple calibration methods and parameter combinations
  5. Output format issue: Output correctly configured as float32 (removed Dequantize via remove_node_type)
  6. Compilation log: No warnings or errors, compilation successful

  Debug Information

  HBM Model Input/Output Information

```
  # hbm_runtime query results
  Model: decoder_opt_resize
  Inputs:
    - noisy_action:      [1, 64, 14, 8]   INT8
    - image_feature:     [1, 3, 400, 256] INT8
    - robot_feature:     [1, 14, 1, 256]  INT8
    - timestep:          [1]              INT16
    - joint_relative_pos: [1, 14, 14]     INT8

  Output:
    - pred_action:       [1, 64, 14, 8]   FLOAT32
      Quant type: NONE
      Scale: []
```

  Compilation Log Key Information

```
  2025-10-23 21:54:10 hb_compile completes running
  remove_node_type: Quantize;Cast;Softmax
  pred_action output [1, 64, 14, 8] FLOAT32
  No warnings or errors
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SEM Decoder Model PTQ Quantization Results in Severe Output Collapse with Only 3 Unique Values #31

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SEM Decoder Model PTQ Quantization Results in Severe Output Collapse with Only 3 Unique Values #31

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions