Skip to content

Issues Related to TensorRT Accelerated Inference #37

@bfloat16

Description

@bfloat16

After separating STFT and ISTFT from the BSRoformer class, I was able to successfully export the model to ONNX, and trtexec could convert the ONNX model to a TensorRT engine. However, TensorRT did not accelerate the inference; instead, it was twice as slow compared to the Torch implementation.

Torch takes approximately 0.13 seconds to infer a slice, while TensorRT takes 0.27 seconds to infer the same slice (tested on an RTX 4090). Using NVIDIA Nsight for monitoring, the preliminary analysis suggests that the slowdown is caused by the Tile operation. Is there any way to alleviate this issue in TensorRT without retraining the model?

Nsight Result:
c93fd2dff5ed687b4b5af605e678c500
Modified source code:
https://github.com/bfloat16/Music-Source-Separation-Training

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions