Add TurboSSDInferenceModule for HSTU serving integration by goldcoderZ · Pull Request #5560 · pytorch/FBGEMM

goldcoderZ · 2026-03-31T15:54:32Z

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2525

Add a serving-friendly wrapper module that enables Video Retrieval HSTU
models (VDD, New2, IFU) to use FBGEMM SSD TBE with TurboSSD v2 features
instead of EmbeddingDB.

TurboSSDInferenceModule provides:

Single-call forward: auto-prefetch + lookup (vs. EmbeddingDB's
synchronous SSD reads). Uses GPU HBM cache with LRU eviction.
streaming_update() + load_snapshot(): streaming delta updates and
zero-downtime snapshot transitions, matching EmbeddingDB's streaming
delta table feature.
Factory method from_embedding_specs(): auto-sizes HBM cache based on
target hit rate and optional HBM budget. Useful for capacity planning
on H100 (96 GB) and MI350X (288 GB).
estimate_hbm_gb(): static method for HBM capacity planning without
instantiating the module.

This is the integration layer between SSD TBE and the TGIF serving
framework. The module can replace SSDEmbeddingDBSplitTableBatchedEmbedding
BagsCodegen in the model graph via DIShardingPass.

Differential Revision: D98830743

Summary: X-link: facebookresearch/FBGEMM#2521 Add streaming delta update support to the SSD TBE inference operator (SSDIntNBitTableBatchedEmbeddingBags), closing the gap with EmbeddingDB's streaming delta table feature. Two new public methods: 1. streaming_update(indices, weights) — writes updated embedding rows to RocksDB and invalidates corresponding HBM cache entries so subsequent prefetch() calls reload from SSD. Uses vectorized set-associative cache invalidation. 2. load_snapshot(ssd_storage_directory, ...) — flushes the current RocksDB, opens a new instance at a fresh directory, and fully invalidates the HBM cache. Enables zero-downtime snapshot transitions. Also adds AMD/ROCm awareness: - IS_ROCM detection flag - Constructor warns when running on ROCm (streaming API works but prefetch/forward C++ kernels are NVIDIA-only due to ASSOC=32 vs kWarpSize=64 mismatch) - Docstring documents AMD support status per method These are the minimal primitives needed for online training models with streaming embedding updates (e.g., 45-min publish intervals). Differential Revision: D98827795

Summary: X-link: facebookresearch/FBGEMM#2525 Add a serving-friendly wrapper module that enables Video Retrieval HSTU models (VDD, New2, IFU) to use FBGEMM SSD TBE with TurboSSD v2 features instead of EmbeddingDB. TurboSSDInferenceModule provides: 1. Single-call forward: auto-prefetch + lookup (vs. EmbeddingDB's synchronous SSD reads). Uses GPU HBM cache with LRU eviction. 2. streaming_update() + load_snapshot(): streaming delta updates and zero-downtime snapshot transitions, matching EmbeddingDB's streaming delta table feature. 3. Factory method from_embedding_specs(): auto-sizes HBM cache based on target hit rate and optional HBM budget. Useful for capacity planning on H100 (96 GB) and MI350X (288 GB). 4. estimate_hbm_gb(): static method for HBM capacity planning without instantiating the module. This is the integration layer between SSD TBE and the TGIF serving framework. The module can replace SSDEmbeddingDBSplitTableBatchedEmbedding BagsCodegen in the model graph via DIShardingPass. Differential Revision: D98830743

meta-codesync · 2026-03-31T15:54:50Z

@goldcoderZ has exported this pull request. If you are a Meta employee, you can view the originating Diff in D98830743.

goldcoderZ added 2 commits March 31, 2026 08:54

meta-cla bot added the cla signed label Mar 31, 2026

meta-codesync bot added fb-exported meta-exported labels Mar 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TurboSSDInferenceModule for HSTU serving integration#5560

Add TurboSSDInferenceModule for HSTU serving integration#5560
goldcoderZ wants to merge 2 commits intopytorch:mainfrom
goldcoderZ:export-D98830743

goldcoderZ commented Mar 31, 2026

Uh oh!

meta-codesync bot commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

goldcoderZ commented Mar 31, 2026

Uh oh!

meta-codesync bot commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant