Add embedding cache support to oneflow base model (#5552)#5552
Closed
EddyLXJ wants to merge 1 commit intopytorch:mainfrom
Closed
Add embedding cache support to oneflow base model (#5552)#5552EddyLXJ wants to merge 1 commit intopytorch:mainfrom
EddyLXJ wants to merge 1 commit intopytorch:mainfrom
Conversation
Contributor
f1abc7e to
f365226
Compare
EddyLXJ
added a commit
to EddyLXJ/FBGEMM-1
that referenced
this pull request
Apr 6, 2026
Summary: X-link: facebookresearch/FBGEMM#2519 CONTEXT: Port embedding cache changes from blue_reels_umia_v1_exp_hstu_simplified_v5_base_model.py to the oneflow base model. All embedding cache related configs are gated behind SID_INJECTION_MODE_V5 == PRETRAIN_MAP_EMBEDDING_CACHE so the base model remains unchanged when the mode is not active. WHAT: - Gate all embedding cache configs behind PRETRAIN_MAP_EMBEDDING_CACHE: - sparse_object_id_for_embedding_cache feature and replication config - zch_embedding_cache_item_fb_public embedding table (MX4-safe dibit encoding, 32x2bit) - emb_cache_item_ec EmbeddingCollection initialization - KVZCHTBEConfig with OneFlow feature store enrichment - TBE SSD/cache params (prefetch_pipeline, rocksdb, l2_cache, etc.) - QComms fp8_quantize_dim=32 - Simplify forward path using build_embedding_cache_write_kjt from kvzch_utils - Extract decode_dibits_to_int64 utility to kvzch_utils.py for reuse - Update C++ encode/decode from FP8 nibbles (16x4bit) to MX4-safe dibits (32x2bit) Reviewed By: xinzhang-nac Differential Revision: D98399416
EddyLXJ
added a commit
to EddyLXJ/FBGEMM-1
that referenced
this pull request
Apr 6, 2026
Summary: X-link: facebookresearch/FBGEMM#2519 CONTEXT: Port embedding cache changes from blue_reels_umia_v1_exp_hstu_simplified_v5_base_model.py to the oneflow base model. All embedding cache related configs are gated behind SID_INJECTION_MODE_V5 == PRETRAIN_MAP_EMBEDDING_CACHE so the base model remains unchanged when the mode is not active. WHAT: - Gate all embedding cache configs behind PRETRAIN_MAP_EMBEDDING_CACHE: - sparse_object_id_for_embedding_cache feature and replication config - zch_embedding_cache_item_fb_public embedding table (MX4-safe dibit encoding, 32x2bit) - emb_cache_item_ec EmbeddingCollection initialization - KVZCHTBEConfig with OneFlow feature store enrichment - TBE SSD/cache params (prefetch_pipeline, rocksdb, l2_cache, etc.) - QComms fp8_quantize_dim=32 - Simplify forward path using build_embedding_cache_write_kjt from kvzch_utils - Extract decode_dibits_to_int64 utility to kvzch_utils.py for reuse - Update C++ encode/decode from FP8 nibbles (16x4bit) to MX4-safe dibits (32x2bit) Reviewed By: xinzhang-nac Differential Revision: D98399416
4d59a9a to
a83c23f
Compare
EddyLXJ
added a commit
to EddyLXJ/FBGEMM-1
that referenced
this pull request
Apr 7, 2026
Summary: X-link: facebookresearch/FBGEMM#2519 CONTEXT: Port embedding cache changes from blue_reels_umia_v1_exp_hstu_simplified_v5_base_model.py to the oneflow base model. All embedding cache related configs are gated behind SID_INJECTION_MODE_V5 == PRETRAIN_MAP_EMBEDDING_CACHE so the base model remains unchanged when the mode is not active. WHAT: - Gate all embedding cache configs behind PRETRAIN_MAP_EMBEDDING_CACHE: - sparse_object_id_for_embedding_cache feature and replication config - zch_embedding_cache_item_fb_public embedding table (MX4-safe dibit encoding, 32x2bit) - emb_cache_item_ec EmbeddingCollection initialization - KVZCHTBEConfig with OneFlow feature store enrichment - TBE SSD/cache params (prefetch_pipeline, rocksdb, l2_cache, etc.) - QComms fp8_quantize_dim=32 - Simplify forward path using build_embedding_cache_write_kjt from kvzch_utils - Extract decode_dibits_to_int64 utility to kvzch_utils.py for reuse - Update C++ encode/decode from FP8 nibbles (16x4bit) to MX4-safe dibits (32x2bit) Reviewed By: xinzhang-nac Differential Revision: D98399416
EddyLXJ
added a commit
to EddyLXJ/FBGEMM-1
that referenced
this pull request
Apr 7, 2026
Summary: X-link: facebookresearch/FBGEMM#2519 CONTEXT: Port embedding cache changes from blue_reels_umia_v1_exp_hstu_simplified_v5_base_model.py to the oneflow base model. All embedding cache related configs are gated behind SID_INJECTION_MODE_V5 == PRETRAIN_MAP_EMBEDDING_CACHE so the base model remains unchanged when the mode is not active. WHAT: - Gate all embedding cache configs behind PRETRAIN_MAP_EMBEDDING_CACHE: - sparse_object_id_for_embedding_cache feature and replication config - zch_embedding_cache_item_fb_public embedding table (MX4-safe dibit encoding, 32x2bit) - emb_cache_item_ec EmbeddingCollection initialization - KVZCHTBEConfig with OneFlow feature store enrichment - TBE SSD/cache params (prefetch_pipeline, rocksdb, l2_cache, etc.) - QComms fp8_quantize_dim=32 - Simplify forward path using build_embedding_cache_write_kjt from kvzch_utils - Extract decode_dibits_to_int64 utility to kvzch_utils.py for reuse - Update C++ encode/decode from FP8 nibbles (16x4bit) to MX4-safe dibits (32x2bit) Reviewed By: xinzhang-nac Differential Revision: D98399416
a83c23f to
c8b88b7
Compare
EddyLXJ
added a commit
to EddyLXJ/FBGEMM-1
that referenced
this pull request
Apr 7, 2026
Summary: Pull Request resolved: pytorch#5552 X-link: https://github.com/facebookresearch/FBGEMM/pull/2519 CONTEXT: Port embedding cache changes from blue_reels_umia_v1_exp_hstu_simplified_v5_base_model.py to the oneflow base model. All embedding cache related configs are gated behind SID_INJECTION_MODE_V5 == PRETRAIN_MAP_EMBEDDING_CACHE so the base model remains unchanged when the mode is not active. WHAT: - Gate all embedding cache configs behind PRETRAIN_MAP_EMBEDDING_CACHE: - sparse_object_id_for_embedding_cache feature and replication config - zch_embedding_cache_item_fb_public embedding table (MX4-safe dibit encoding, 32x2bit) - emb_cache_item_ec EmbeddingCollection initialization - KVZCHTBEConfig with OneFlow feature store enrichment - TBE SSD/cache params (prefetch_pipeline, rocksdb, l2_cache, etc.) - QComms fp8_quantize_dim=32 - Simplify forward path using build_embedding_cache_write_kjt from kvzch_utils - Extract decode_dibits_to_int64 utility to kvzch_utils.py for reuse - Update C++ encode/decode from FP8 nibbles (16x4bit) to MX4-safe dibits (32x2bit) Reviewed By: xinzhang-nac Differential Revision: D98399416
c8b88b7 to
170aa67
Compare
EddyLXJ
added a commit
to EddyLXJ/FBGEMM-1
that referenced
this pull request
Apr 7, 2026
Summary: Pull Request resolved: pytorch#5552 X-link: https://github.com/facebookresearch/FBGEMM/pull/2519 CONTEXT: Port embedding cache changes from blue_reels_umia_v1_exp_hstu_simplified_v5_base_model.py to the oneflow base model. All embedding cache related configs are gated behind SID_INJECTION_MODE_V5 == PRETRAIN_MAP_EMBEDDING_CACHE so the base model remains unchanged when the mode is not active. WHAT: - Gate all embedding cache configs behind PRETRAIN_MAP_EMBEDDING_CACHE: - sparse_object_id_for_embedding_cache feature and replication config - zch_embedding_cache_item_fb_public embedding table (MX4-safe dibit encoding, 32x2bit) - emb_cache_item_ec EmbeddingCollection initialization - KVZCHTBEConfig with OneFlow feature store enrichment - TBE SSD/cache params (prefetch_pipeline, rocksdb, l2_cache, etc.) - QComms fp8_quantize_dim=32 - Simplify forward path using build_embedding_cache_write_kjt from kvzch_utils - Extract decode_dibits_to_int64 utility to kvzch_utils.py for reuse - Update C++ encode/decode from FP8 nibbles (16x4bit) to MX4-safe dibits (32x2bit) Reviewed By: xinzhang-nac Differential Revision: D98399416
170aa67 to
823d714
Compare
EddyLXJ
added a commit
to EddyLXJ/FBGEMM-1
that referenced
this pull request
Apr 7, 2026
Summary: X-link: facebookresearch/FBGEMM#2519 CONTEXT: Port embedding cache changes from blue_reels_umia_v1_exp_hstu_simplified_v5_base_model.py to the oneflow base model. All embedding cache related configs are gated behind SID_INJECTION_MODE_V5 == PRETRAIN_MAP_EMBEDDING_CACHE so the base model remains unchanged when the mode is not active. WHAT: - Gate all embedding cache configs behind PRETRAIN_MAP_EMBEDDING_CACHE: - sparse_object_id_for_embedding_cache feature and replication config - zch_embedding_cache_item_fb_public embedding table (MX4-safe dibit encoding, 32x2bit) - emb_cache_item_ec EmbeddingCollection initialization - KVZCHTBEConfig with OneFlow feature store enrichment - TBE SSD/cache params (prefetch_pipeline, rocksdb, l2_cache, etc.) - QComms fp8_quantize_dim=32 - Simplify forward path using build_embedding_cache_write_kjt from kvzch_utils - Extract decode_dibits_to_int64 utility to kvzch_utils.py for reuse - Update C++ encode/decode from FP8 nibbles (16x4bit) to MX4-safe dibits (32x2bit) Reviewed By: xinzhang-nac Differential Revision: D98399416
823d714 to
6bcef33
Compare
EddyLXJ
added a commit
to EddyLXJ/FBGEMM-1
that referenced
this pull request
Apr 7, 2026
Summary: X-link: facebookresearch/FBGEMM#2519 CONTEXT: Port embedding cache changes from blue_reels_umia_v1_exp_hstu_simplified_v5_base_model.py to the oneflow base model. All embedding cache related configs are gated behind SID_INJECTION_MODE_V5 == PRETRAIN_MAP_EMBEDDING_CACHE so the base model remains unchanged when the mode is not active. WHAT: - Gate all embedding cache configs behind PRETRAIN_MAP_EMBEDDING_CACHE: - sparse_object_id_for_embedding_cache feature and replication config - zch_embedding_cache_item_fb_public embedding table (MX4-safe dibit encoding, 32x2bit) - emb_cache_item_ec EmbeddingCollection initialization - KVZCHTBEConfig with OneFlow feature store enrichment - TBE SSD/cache params (prefetch_pipeline, rocksdb, l2_cache, etc.) - QComms fp8_quantize_dim=32 - Simplify forward path using build_embedding_cache_write_kjt from kvzch_utils - Extract decode_dibits_to_int64 utility to kvzch_utils.py for reuse - Update C++ encode/decode from FP8 nibbles (16x4bit) to MX4-safe dibits (32x2bit) Reviewed By: xinzhang-nac Differential Revision: D98399416
ef6ec98 to
8aeec3a
Compare
EddyLXJ
added a commit
to EddyLXJ/FBGEMM-1
that referenced
this pull request
Apr 7, 2026
Summary: X-link: facebookresearch/FBGEMM#2519 CONTEXT: Port embedding cache changes from blue_reels_umia_v1_exp_hstu_simplified_v5_base_model.py to the oneflow base model. All embedding cache related configs are gated behind SID_INJECTION_MODE_V5 == PRETRAIN_MAP_EMBEDDING_CACHE so the base model remains unchanged when the mode is not active. WHAT: - Gate all embedding cache configs behind PRETRAIN_MAP_EMBEDDING_CACHE: - sparse_object_id_for_embedding_cache feature and replication config - zch_embedding_cache_item_fb_public embedding table (MX4-safe dibit encoding, 32x2bit) - emb_cache_item_ec EmbeddingCollection initialization - KVZCHTBEConfig with OneFlow feature store enrichment - TBE SSD/cache params (prefetch_pipeline, rocksdb, l2_cache, etc.) - QComms fp8_quantize_dim=32 - Simplify forward path using build_embedding_cache_write_kjt from kvzch_utils - Extract decode_dibits_to_int64 utility to kvzch_utils.py for reuse - Update C++ encode/decode from FP8 nibbles (16x4bit) to MX4-safe dibits (32x2bit) Reviewed By: xinzhang-nac Differential Revision: D98399416
8aeec3a to
f69e314
Compare
Summary: Pull Request resolved: pytorch#5552 X-link: https://github.com/facebookresearch/FBGEMM/pull/2519 CONTEXT: Port embedding cache changes from blue_reels_umia_v1_exp_hstu_simplified_v5_base_model.py to the oneflow base model. All embedding cache related configs are gated behind SID_INJECTION_MODE_V5 == PRETRAIN_MAP_EMBEDDING_CACHE so the base model remains unchanged when the mode is not active. WHAT: - Gate all embedding cache configs behind PRETRAIN_MAP_EMBEDDING_CACHE: - sparse_object_id_for_embedding_cache feature and replication config - zch_embedding_cache_item_fb_public embedding table (MX4-safe dibit encoding, 32x2bit) - emb_cache_item_ec EmbeddingCollection initialization - KVZCHTBEConfig with OneFlow feature store enrichment - TBE SSD/cache params (prefetch_pipeline, rocksdb, l2_cache, etc.) - QComms fp8_quantize_dim=32 - Simplify forward path using build_embedding_cache_write_kjt from kvzch_utils - Extract decode_dibits_to_int64 utility to kvzch_utils.py for reuse - Update C++ encode/decode from FP8 nibbles (16x4bit) to MX4-safe dibits (32x2bit) Reviewed By: xinzhang-nac Differential Revision: D98399416
f69e314 to
78f21f7
Compare
Contributor
|
This pull request has been merged in 6de075d. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2519
CONTEXT: Port embedding cache changes from blue_reels_umia_v1_exp_hstu_simplified_v5_base_model.py to the oneflow base model. All embedding cache related configs are gated behind SID_INJECTION_MODE_V5 == PRETRAIN_MAP_EMBEDDING_CACHE so the base model remains unchanged when the mode is not active.
WHAT:
Reviewed By: xinzhang-nac
Differential Revision: D98399416