This repository provides a step-by-step guide and code for optimizing a state-of-the-art semantic segmentation model using TorchScript, ONNX, and TensorRT.
- CUDA: 12.0 (driver: 525)
- cuDNN: 8.9
- TensorRT: 8.6
- PyCUDA
- Jetpack: 4.6.2
- PyCUDA
- Clone this repository and download the pretrained model from the official PIDNet repository.
For TorchScript:
python tools/export.py --a pidnet-s --p ./pretrained_models/cityscapes/PIDNet_S_Cityscapes_test.pt --f torchscriptFor ONNX:
python tools/export.py --a pidnet-s --p ./pretrained_models/cityscapes/PIDNet_S_Cityscapes_test.pt --f onnxFor TensorRT (using the above ONNX model):
trtexec --onnx=path/to/onnx/model --saveEngine=path/to/engine python tools/inference.py --f pytorch- Measure the inference speed of PIDNet-S for Cityscapes:
python models/speed/pidnet_speed.py --f all| FPS | % increase | |
|---|---|---|
| PyTorch | 24.72 | - |
| TorchScript | 27.09 | 9.59 |
| ONNX (with TensorRT EP) | 33.52 | 35.60 |
| TensorRT | 32.93 | 33.21 |
speed test is performed on a single Nvidia GeForce RTX 3050 GPU