Issues Related to TensorRT Accelerated Inference

After separating STFT and ISTFT from the `BSRoformer` class, I was able to successfully export the model to ONNX, and `trtexec` could convert the ONNX model to a TensorRT engine. However, TensorRT did not accelerate the inference; instead, it was twice as slow compared to the Torch implementation.

Torch takes approximately 0.13 seconds to infer a slice, while TensorRT takes 0.27 seconds to infer the same slice (tested on an RTX 4090). Using NVIDIA Nsight for monitoring, the preliminary analysis suggests that the slowdown is caused by the Tile operation. Is there any way to alleviate this issue in TensorRT without retraining the model?

Nsight Result:
![c93fd2dff5ed687b4b5af605e678c500](https://github.com/user-attachments/assets/79a5c9d7-1f8d-4b1c-8083-71a112e4fe3a)
Modified source code:
https://github.com/bfloat16/Music-Source-Separation-Training

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issues Related to TensorRT Accelerated Inference #37

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issues Related to TensorRT Accelerated Inference #37

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions