Cross-Platform Torch Save / Load Support, Doc Updates, Release Prep (#141)

vbharadwaj-bk · web-flow · commit 2dd16846d0aa · 2025-06-22T16:38:50.000-07:00
* Torch save / load implemented.

* Linted.

* Updated documentation and changelog.

* Updated documentation.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,54 @@
+## Latest Changes
+
+### v0.3.0 (2025-06-22)
+This release includes bugfixes and new opaque operations that
+compose with `torch.compile` 
+for PT2.4-2.7. These will be unnecessary for PT2.8+. 
+
+**Added**:
+1. Opaque variants of major operations 
+   via PyTorch `custom_op` declarations. These
+   functions cannot be traced through and fail
+   for JITScript / AOTI. They are shims that
+   enable composition with `torch.compile`
+   pre-PT2.8.
+2. `torch.load`/`torch.save` functionality
+   that, without `torch.compile`, is portable
+   across GPU architectures.
+3. `.to()` support to move `TensorProduct`
+   and `TensorProductConv` between devices or
+   change datatypes.
+
+**Fixed**:
+1. Gracefully records an error if `libpython.so`
+   is not linked against C++ extension.
+2. Resolves Kahan summation / various other bugs
+   for HIP at O3 compiler-optimization level. 
+3. Removes multiple contexts spawning for GPU 0
+   when multiple devices are used.
+4. Zero-initialized gradient buffers to prevent
+   backward pass garbage accumulation. 
+
+### v0.2.0 (2025-06-09) 
+
+Our first stable release, **v0.2.0**, introduces several new features. Highlights include:
+
+1. Full HIP support for all kernels.
+2. Support for `torch.compile`, JITScript and export, preliminary support for AOTI.
+3. Faster double backward performance for training.
+4. Ability to install versioned releases from PyPI.
+5. Support for CUDA streams and multiple devices.
+6. An extensive test suite and newly released [documentation](https://passionlab.github.io/OpenEquivariance/).
+
+If you successfully run OpenEquivariance on a GPU model not listed [here](https://passionlab.github.io/OpenEquivariance/tests_and_benchmarks/), let us know! We can add your name to the list.
+
+---
+
+**Known issues:**
+
+- Kahan summation is broken on HIP – fix planned.
+- FX + Export + Compile has trouble with PyTorch dynamo; fix planned.
+- AOTI broken on PT <2.8; you need the nightly build due to incomplete support for TorchBind in prior versions.
+
+### v0.1.0 (2025-01-23) 
+Initial Github release with preprint. 
diff --git a/docs/supported_ops.rst b/docs/supported_ops.rst
@@ -55,6 +55,26 @@ We do not (yet) support:
 
 If you have a use case for any of the unsupported features above, let us know.
 
+
+Torch Save / Load 
+---------------------------------------------------
+OpenEquivariance's ``TensorProduct`` / ``TensorProductConv`` modules 
+can be saved via ``torch.save`` and restored via ``torch.load``.
+You must call ``import openequivariance`` before attempting to load, i.e.
+
+.. code-block::
+
+    import torch
+    import openequivariance
+    module = torch.load("my_module_with_tp.pt")
+
+If you do NOT use ``torch.compile`` or ``torch.export``, these modules 
+can be loaded on a platform with a distinct GPU architecture from the saving
+platform. In this case, kernels are recompiled dynamically. After compilation, 
+a module may only be used on a platform with GPU architecture identical 
+to the machine that saved it. 
+
+
 Compilation with JITScript, Export, and AOTInductor
 ---------------------------------------------------
 
@@ -72,7 +92,6 @@ unless you are using a Nightly
 build of PyTorch past 4/10/2025 due to incomplete support for 
 TorchBind in earlier versions.
 
-
 Multiple Devices and Streams 
 ----------------------------
 OpenEquivariance compiles kernels based on the compute capability of the
diff --git a/openequivariance/__init__.py b/openequivariance/__init__.py
@@ -1,5 +1,7 @@
 # ruff: noqa: F401
 import sys
+import torch
+import numpy as np
 
 try:
     import openequivariance.extlib
@@ -8,7 +10,13 @@
 from pathlib import Path
 from importlib.metadata import version
 
-from openequivariance.implementations.e3nn_lite import TPProblem, Irreps
+from openequivariance.implementations.e3nn_lite import (
+    TPProblem,
+    Irrep,
+    Irreps,
+    _MulIr,
+    Instruction,
+)
 from openequivariance.implementations.TensorProduct import TensorProduct
 from openequivariance.implementations.convolution.TensorProductConv import (
     TensorProductConv,
@@ -41,6 +49,20 @@ def torch_ext_so_path():
     return openequivariance.extlib.torch_module.__file__
 
 
+torch.serialization.add_safe_globals(
+    [
+        TensorProduct,
+        TensorProductConv,
+        TPProblem,
+        Irrep,
+        Irreps,
+        _MulIr,
+        Instruction,
+        np.float32,
+        np.float64,
+    ]
+)
+
 LINKED_LIBPYTHON = openequivariance.extlib.LINKED_LIBPYTHON
 LINKED_LIBPYTHON_ERROR = openequivariance.extlib.LINKED_LIBPYTHON_ERROR
 
diff --git a/openequivariance/implementations/TensorProduct.py b/openequivariance/implementations/TensorProduct.py
@@ -52,6 +52,14 @@ def to(self, *args, **kwargs):
         torch.nn.Module.to(self, *args, **kwargs)
         return self
 
+    def __getstate__(self):
+        return self.input_args
+
+    def __setstate__(self, state):
+        torch.nn.Module.__init__(self)
+        self.input_args = state
+        self._init_class()
+
     @staticmethod
     def name():
         return LoopUnrollTP.name()
diff --git a/openequivariance/implementations/convolution/TensorProductConv.py b/openequivariance/implementations/convolution/TensorProductConv.py
@@ -84,6 +84,14 @@ def to(self, *args, **kwargs):
         torch.nn.Module.to(self, *args, **kwargs)
         return self
 
+    def __getstate__(self):
+        return self.input_args
+
+    def __setstate__(self, state):
+        torch.nn.Module.__init__(self)
+        self.input_args = state
+        self._init_class()
+
     def forward(
         self,
         X: torch.Tensor,
diff --git a/tests/export_test.py b/tests/export_test.py
@@ -86,6 +86,18 @@ def tp_and_inputs(request, problem_and_irreps):
             )
 
 
+def test_torch_load(tp_and_inputs):
+    tp, inputs = tp_and_inputs
+    original_result = tp.forward(*inputs)
+
+    with tempfile.NamedTemporaryFile(suffix=".pt") as tmp_file:
+        torch.save(tp, tmp_file.name)
+        loaded_tp = torch.load(tmp_file.name)
+
+    reloaded_result = loaded_tp(*inputs)
+    assert torch.allclose(original_result, reloaded_result, atol=1e-5)
+
+
 def test_jitscript(tp_and_inputs):
     tp, inputs = tp_and_inputs
     uncompiled_result = tp.forward(*inputs)