Description
Describe the bug
UTILITY::Clean_name() in SOFIE_common.cxx (line 513) erases dot characters (.) from tensor names instead of replacing them with a safe character (e.g., _). This causes tensor name collisions when parsing TorchScript models whose ONNX graphs contain dotted intermediate names.
The Collision
TorchScript generates intermediate tensor names using a sequential dotted pattern:
input.1 → input0 → input1 → input2 → ...
When Clean_name() processes these:
| Original Name |
After Clean_name() |
Intended Role |
input.1 |
input1 |
Model input tensor |
input1 |
input1 |
Intermediate tensor (output of first op) |
Both map to the same C++ variable name input1, causing the generated inference code to contain duplicate declarations or silently overwrite tensor data.
Root Cause
// SOFIE_common.cxx line 513-518
std::string UTILITY::Clean_name(std::string input_tensor_name){
std::string s (input_tensor_name);
std::replace( s.begin(), s.end(), '-', '_');
s.erase(std::remove_if(s.begin(), s.end(),
[]( char const& c ) -> bool { return !std::isalnum(c) && c != '_'; } ), s.end());
return s;
}
The function replaces - → _ explicitly, but all other non-alphanumeric characters (including .) are erased rather than replaced. This is inconsistent: - gets a safe replacement while . gets deleted.
Affected Parsers
- PyTorch parser (
RModelParser_PyTorch.cxx): TorchScript ONNX graphs consistently use dotted names (input.1, x.1, etc.).
- ONNX parser (
RModelParser_ONNX.cxx): Some ONNX models from external tools also use dotted tensor names.
- Keras parser: Not directly affected (Keras uses
_ in names natively).
Why Existing Tests Pass
The existing PyTorch parser tests (SEQUENTIAL_MODEL, MODULE_MODEL, CONVOLUTION_MODEL) pass by coincidence. TorchScript happens to generate non-colliding name sequences for those specific architectures. Models using operators like Tanh, Softmax, or BatchNormalization trigger the collision because TorchScript assigns sequential dotted names without gaps.
Proposed Fix
Replace dots with underscores before erasing, consistent with the existing - → _ treatment:
std::string UTILITY::Clean_name(std::string input_tensor_name){
std::string s (input_tensor_name);
std::replace( s.begin(), s.end(), '-', '_');
std::replace( s.begin(), s.end(), '.', '_'); // <-- add this line
s.erase(std::remove_if(s.begin(), s.end(),
[]( char const& c ) -> bool { return !std::isalnum(c) && c != '_'; } ), s.end());
return s;
}
This makes input.1 → input_1 (distinct from input1), eliminating the collision.
Alternative (parser-level workaround): Normalize dotted names on the Python side before they reach Clean_name(). This is what PR #21528 (PyTorch parser extension) implements as a workaround, but the root cause remains in Clean_name() and affects all parsers.
Reproducer
- Create a PyTorch model with
nn.Tanh() or any activation that causes TorchScript to assign sequential dotted tensor names:
import torch
import torch.nn as nn
model = nn.Sequential(
nn.Linear(4, 8),
nn.Tanh(),
nn.Linear(8, 6),
)
model.eval()
m = torch.jit.script(model)
torch.jit.save(m, "test_collision.pt")
- Parse with SOFIE:
using namespace TMVA::Experimental;
std::vector<size_t> shape{2, 4};
SOFIE::RModel model = SOFIE::PyTorch::Parse("test_collision.pt", {shape});
model.Generate();
- Inspect generated code - the model input
input.1 and the intermediate output of Gemm (input1) both become input1, producing a name collision.
Root cause in SOFIE_common.cxx line 513:
std::string UTILITY::Clean_name(std::string input_tensor_name){
std::string s (input_tensor_name);
std::replace( s.begin(), s.end(), '-', '_');
// erases '.' instead of replacing it
s.erase(std::remove_if(s.begin(), s.end(),
[]( char const& c ) -> bool { return !std::isalnum(c) && c != '_'; } ), s.end());
return s;
}
Fix: Add std::replace( s.begin(), s.end(), '.', '_'); before the erase, consistent with the existing - → _ treatment.
ROOT version
master
Installation method
build from source
Operating system
Linux (Ubuntu 24.04)
Additional context
Affected parsers:
- PyTorch parser (
RModelParser_PyTorch.cxx) - TorchScript consistently uses dotted names
- ONNX parser (
RModelParser_ONNX.cxx) - some external ONNX tools also produce dotted names
Why existing tests pass: The current SEQUENTIAL_MODEL, MODULE_MODEL, and CONVOLUTION_MODEL tests use architectures where TorchScript happens to generate non-colliding name sequences (e.g., input, input.3, input.5 - skipping numbers). Models with Tanh, Softmax, or BatchNormalization trigger the collision because TorchScript assigns consecutive names without gaps.
Workaround: Normalizing dots to underscores on the Python side before names reach Clean_name(), e.g., x.debugName().replace('.','_'). This is implemented in PR #21528 as a parser-level workaround.
cc: @lmoneta @sanjibansg @guitargeek
Description
Describe the bug
UTILITY::Clean_name()inSOFIE_common.cxx(line 513) erases dot characters (.) from tensor names instead of replacing them with a safe character (e.g.,_). This causes tensor name collisions when parsing TorchScript models whose ONNX graphs contain dotted intermediate names.The Collision
TorchScript generates intermediate tensor names using a sequential dotted pattern:
When
Clean_name()processes these:Clean_name()input.1input1input1input1Both map to the same C++ variable name
input1, causing the generated inference code to contain duplicate declarations or silently overwrite tensor data.Root Cause
The function replaces
-→_explicitly, but all other non-alphanumeric characters (including.) are erased rather than replaced. This is inconsistent:-gets a safe replacement while.gets deleted.Affected Parsers
RModelParser_PyTorch.cxx): TorchScript ONNX graphs consistently use dotted names (input.1,x.1, etc.).RModelParser_ONNX.cxx): Some ONNX models from external tools also use dotted tensor names._in names natively).Why Existing Tests Pass
The existing PyTorch parser tests (
SEQUENTIAL_MODEL,MODULE_MODEL,CONVOLUTION_MODEL) pass by coincidence. TorchScript happens to generate non-colliding name sequences for those specific architectures. Models using operators likeTanh,Softmax, orBatchNormalizationtrigger the collision because TorchScript assigns sequential dotted names without gaps.Proposed Fix
Replace dots with underscores before erasing, consistent with the existing
-→_treatment:This makes
input.1→input_1(distinct frominput1), eliminating the collision.Alternative (parser-level workaround): Normalize dotted names on the Python side before they reach
Clean_name(). This is what PR #21528 (PyTorch parser extension) implements as a workaround, but the root cause remains inClean_name()and affects all parsers.Reproducer
nn.Tanh()or any activation that causes TorchScript to assign sequential dotted tensor names:input.1and the intermediate output of Gemm (input1) both becomeinput1, producing a name collision.Root cause in
SOFIE_common.cxxline 513:Fix: Add
std::replace( s.begin(), s.end(), '.', '_');before the erase, consistent with the existing-→_treatment.ROOT version
master
Installation method
build from source
Operating system
Linux (Ubuntu 24.04)
Additional context
Affected parsers:
RModelParser_PyTorch.cxx) - TorchScript consistently uses dotted namesRModelParser_ONNX.cxx) - some external ONNX tools also produce dotted namesWhy existing tests pass: The current
SEQUENTIAL_MODEL,MODULE_MODEL, andCONVOLUTION_MODELtests use architectures where TorchScript happens to generate non-colliding name sequences (e.g.,input,input.3,input.5- skipping numbers). Models withTanh,Softmax, orBatchNormalizationtrigger the collision because TorchScript assigns consecutive names without gaps.Workaround: Normalizing dots to underscores on the Python side before names reach
Clean_name(), e.g.,x.debugName().replace('.','_'). This is implemented in PR #21528 as a parser-level workaround.cc: @lmoneta @sanjibansg @guitargeek