You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tiling_optimization_level (str): The optimization level of tiling strategies. A higher level allows TensorRT to spend more time searching for better tiling strategy. We currently support ["none", "fast", "moderate", "full"].
180
181
l2_limit_for_tiling (int): The target L2 cache usage limit (in bytes) for tiling optimization (default is -1 which means no limit).
181
182
use_distributed_mode_trace (bool): Using aot_autograd to trace the graph. This is enabled when DTensors or distributed tensors are present in distributed model
183
+
cpu_memory_budget (int): The maximum amount of CPU memory to use for the compilation. If the compilation requires more memory than this budget, the compilation will fail. If set to -1, the compilation will use all available CPU memory.
182
184
**kwargs: Any,
183
185
Returns:
184
186
torch.fx.GraphModule: Compiled FX Module, when run it will execute via TensorRT
f"Subgraph size {sum([sizefor_, sizeinsizes])} is too large to break. Size budget: {size_budget}"
318
+
f"CPU memory budget or available memory is too small to compile the model. CPU memory budget: {self.cpu_memory_budget// (1024*1024) ifself.cpu_memory_budget!=-1else"All available memory"} MB, Model size: {sum([sizefor_, sizeinsizes]) // (1024*1024)} MB. "
319
+
+"Consider setting cpu_memory_budget to a larger value or disable offload_module_to_cpu to save more CPU memory."
0 commit comments