Skip to content

Add embedding cache support to oneflow base model (#5552)#5552

Closed
EddyLXJ wants to merge 1 commit intopytorch:mainfrom
EddyLXJ:export-D98399416
Closed

Add embedding cache support to oneflow base model (#5552)#5552
EddyLXJ wants to merge 1 commit intopytorch:mainfrom
EddyLXJ:export-D98399416

Conversation

@EddyLXJ
Copy link
Copy Markdown
Contributor

@EddyLXJ EddyLXJ commented Mar 30, 2026

Summary:

X-link: https://github.com/facebookresearch/FBGEMM/pull/2519

CONTEXT: Port embedding cache changes from blue_reels_umia_v1_exp_hstu_simplified_v5_base_model.py to the oneflow base model. All embedding cache related configs are gated behind SID_INJECTION_MODE_V5 == PRETRAIN_MAP_EMBEDDING_CACHE so the base model remains unchanged when the mode is not active.

WHAT:

  • Gate all embedding cache configs behind PRETRAIN_MAP_EMBEDDING_CACHE:
    • sparse_object_id_for_embedding_cache feature and replication config
    • zch_embedding_cache_item_fb_public embedding table (MX4-safe dibit encoding, 32x2bit)
    • emb_cache_item_ec EmbeddingCollection initialization
    • KVZCHTBEConfig with OneFlow feature store enrichment
    • TBE SSD/cache params (prefetch_pipeline, rocksdb, l2_cache, etc.)
    • QComms fp8_quantize_dim=32
  • Simplify forward path using build_embedding_cache_write_kjt from kvzch_utils
  • Extract decode_dibits_to_int64 utility to kvzch_utils.py for reuse
  • Update C++ encode/decode from FP8 nibbles (16x4bit) to MX4-safe dibits (32x2bit)

Reviewed By: xinzhang-nac

Differential Revision: D98399416

@meta-cla meta-cla bot added the cla signed label Mar 30, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync bot commented Mar 30, 2026

@EddyLXJ has exported this pull request. If you are a Meta employee, you can view the originating Diff in D98399416.

@meta-codesync meta-codesync bot changed the title Add embedding cache support to oneflow base model Add embedding cache support to oneflow base model (#5552) Apr 6, 2026
@EddyLXJ EddyLXJ force-pushed the export-D98399416 branch from f1abc7e to f365226 Compare April 6, 2026 23:27
EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Apr 6, 2026
Summary:

X-link: facebookresearch/FBGEMM#2519

CONTEXT: Port embedding cache changes from blue_reels_umia_v1_exp_hstu_simplified_v5_base_model.py to the oneflow base model. All embedding cache related configs are gated behind SID_INJECTION_MODE_V5 == PRETRAIN_MAP_EMBEDDING_CACHE so the base model remains unchanged when the mode is not active.

WHAT:
- Gate all embedding cache configs behind PRETRAIN_MAP_EMBEDDING_CACHE:
  - sparse_object_id_for_embedding_cache feature and replication config
  - zch_embedding_cache_item_fb_public embedding table (MX4-safe dibit encoding, 32x2bit)
  - emb_cache_item_ec EmbeddingCollection initialization
  - KVZCHTBEConfig with OneFlow feature store enrichment
  - TBE SSD/cache params (prefetch_pipeline, rocksdb, l2_cache, etc.)
  - QComms fp8_quantize_dim=32
- Simplify forward path using build_embedding_cache_write_kjt from kvzch_utils
- Extract decode_dibits_to_int64 utility to kvzch_utils.py for reuse
- Update C++ encode/decode from FP8 nibbles (16x4bit) to MX4-safe dibits (32x2bit)

Reviewed By: xinzhang-nac

Differential Revision: D98399416
EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Apr 6, 2026
Summary:

X-link: facebookresearch/FBGEMM#2519

CONTEXT: Port embedding cache changes from blue_reels_umia_v1_exp_hstu_simplified_v5_base_model.py to the oneflow base model. All embedding cache related configs are gated behind SID_INJECTION_MODE_V5 == PRETRAIN_MAP_EMBEDDING_CACHE so the base model remains unchanged when the mode is not active.

WHAT:
- Gate all embedding cache configs behind PRETRAIN_MAP_EMBEDDING_CACHE:
  - sparse_object_id_for_embedding_cache feature and replication config
  - zch_embedding_cache_item_fb_public embedding table (MX4-safe dibit encoding, 32x2bit)
  - emb_cache_item_ec EmbeddingCollection initialization
  - KVZCHTBEConfig with OneFlow feature store enrichment
  - TBE SSD/cache params (prefetch_pipeline, rocksdb, l2_cache, etc.)
  - QComms fp8_quantize_dim=32
- Simplify forward path using build_embedding_cache_write_kjt from kvzch_utils
- Extract decode_dibits_to_int64 utility to kvzch_utils.py for reuse
- Update C++ encode/decode from FP8 nibbles (16x4bit) to MX4-safe dibits (32x2bit)

Reviewed By: xinzhang-nac

Differential Revision: D98399416
@EddyLXJ EddyLXJ force-pushed the export-D98399416 branch 2 times, most recently from 4d59a9a to a83c23f Compare April 7, 2026 19:43
EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Apr 7, 2026
Summary:

X-link: facebookresearch/FBGEMM#2519

CONTEXT: Port embedding cache changes from blue_reels_umia_v1_exp_hstu_simplified_v5_base_model.py to the oneflow base model. All embedding cache related configs are gated behind SID_INJECTION_MODE_V5 == PRETRAIN_MAP_EMBEDDING_CACHE so the base model remains unchanged when the mode is not active.

WHAT:
- Gate all embedding cache configs behind PRETRAIN_MAP_EMBEDDING_CACHE:
  - sparse_object_id_for_embedding_cache feature and replication config
  - zch_embedding_cache_item_fb_public embedding table (MX4-safe dibit encoding, 32x2bit)
  - emb_cache_item_ec EmbeddingCollection initialization
  - KVZCHTBEConfig with OneFlow feature store enrichment
  - TBE SSD/cache params (prefetch_pipeline, rocksdb, l2_cache, etc.)
  - QComms fp8_quantize_dim=32
- Simplify forward path using build_embedding_cache_write_kjt from kvzch_utils
- Extract decode_dibits_to_int64 utility to kvzch_utils.py for reuse
- Update C++ encode/decode from FP8 nibbles (16x4bit) to MX4-safe dibits (32x2bit)

Reviewed By: xinzhang-nac

Differential Revision: D98399416
EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Apr 7, 2026
Summary:

X-link: facebookresearch/FBGEMM#2519

CONTEXT: Port embedding cache changes from blue_reels_umia_v1_exp_hstu_simplified_v5_base_model.py to the oneflow base model. All embedding cache related configs are gated behind SID_INJECTION_MODE_V5 == PRETRAIN_MAP_EMBEDDING_CACHE so the base model remains unchanged when the mode is not active.

WHAT:
- Gate all embedding cache configs behind PRETRAIN_MAP_EMBEDDING_CACHE:
  - sparse_object_id_for_embedding_cache feature and replication config
  - zch_embedding_cache_item_fb_public embedding table (MX4-safe dibit encoding, 32x2bit)
  - emb_cache_item_ec EmbeddingCollection initialization
  - KVZCHTBEConfig with OneFlow feature store enrichment
  - TBE SSD/cache params (prefetch_pipeline, rocksdb, l2_cache, etc.)
  - QComms fp8_quantize_dim=32
- Simplify forward path using build_embedding_cache_write_kjt from kvzch_utils
- Extract decode_dibits_to_int64 utility to kvzch_utils.py for reuse
- Update C++ encode/decode from FP8 nibbles (16x4bit) to MX4-safe dibits (32x2bit)

Reviewed By: xinzhang-nac

Differential Revision: D98399416
@EddyLXJ EddyLXJ force-pushed the export-D98399416 branch from a83c23f to c8b88b7 Compare April 7, 2026 19:44
EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Apr 7, 2026
Summary:
Pull Request resolved: pytorch#5552

X-link: https://github.com/facebookresearch/FBGEMM/pull/2519

CONTEXT: Port embedding cache changes from blue_reels_umia_v1_exp_hstu_simplified_v5_base_model.py to the oneflow base model. All embedding cache related configs are gated behind SID_INJECTION_MODE_V5 == PRETRAIN_MAP_EMBEDDING_CACHE so the base model remains unchanged when the mode is not active.

WHAT:
- Gate all embedding cache configs behind PRETRAIN_MAP_EMBEDDING_CACHE:
  - sparse_object_id_for_embedding_cache feature and replication config
  - zch_embedding_cache_item_fb_public embedding table (MX4-safe dibit encoding, 32x2bit)
  - emb_cache_item_ec EmbeddingCollection initialization
  - KVZCHTBEConfig with OneFlow feature store enrichment
  - TBE SSD/cache params (prefetch_pipeline, rocksdb, l2_cache, etc.)
  - QComms fp8_quantize_dim=32
- Simplify forward path using build_embedding_cache_write_kjt from kvzch_utils
- Extract decode_dibits_to_int64 utility to kvzch_utils.py for reuse
- Update C++ encode/decode from FP8 nibbles (16x4bit) to MX4-safe dibits (32x2bit)

Reviewed By: xinzhang-nac

Differential Revision: D98399416
@EddyLXJ EddyLXJ force-pushed the export-D98399416 branch from c8b88b7 to 170aa67 Compare April 7, 2026 19:47
EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Apr 7, 2026
Summary:
Pull Request resolved: pytorch#5552

X-link: https://github.com/facebookresearch/FBGEMM/pull/2519

CONTEXT: Port embedding cache changes from blue_reels_umia_v1_exp_hstu_simplified_v5_base_model.py to the oneflow base model. All embedding cache related configs are gated behind SID_INJECTION_MODE_V5 == PRETRAIN_MAP_EMBEDDING_CACHE so the base model remains unchanged when the mode is not active.

WHAT:
- Gate all embedding cache configs behind PRETRAIN_MAP_EMBEDDING_CACHE:
  - sparse_object_id_for_embedding_cache feature and replication config
  - zch_embedding_cache_item_fb_public embedding table (MX4-safe dibit encoding, 32x2bit)
  - emb_cache_item_ec EmbeddingCollection initialization
  - KVZCHTBEConfig with OneFlow feature store enrichment
  - TBE SSD/cache params (prefetch_pipeline, rocksdb, l2_cache, etc.)
  - QComms fp8_quantize_dim=32
- Simplify forward path using build_embedding_cache_write_kjt from kvzch_utils
- Extract decode_dibits_to_int64 utility to kvzch_utils.py for reuse
- Update C++ encode/decode from FP8 nibbles (16x4bit) to MX4-safe dibits (32x2bit)

Reviewed By: xinzhang-nac

Differential Revision: D98399416
@EddyLXJ EddyLXJ force-pushed the export-D98399416 branch from 170aa67 to 823d714 Compare April 7, 2026 19:52
EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Apr 7, 2026
Summary:

X-link: facebookresearch/FBGEMM#2519

CONTEXT: Port embedding cache changes from blue_reels_umia_v1_exp_hstu_simplified_v5_base_model.py to the oneflow base model. All embedding cache related configs are gated behind SID_INJECTION_MODE_V5 == PRETRAIN_MAP_EMBEDDING_CACHE so the base model remains unchanged when the mode is not active.

WHAT:
- Gate all embedding cache configs behind PRETRAIN_MAP_EMBEDDING_CACHE:
  - sparse_object_id_for_embedding_cache feature and replication config
  - zch_embedding_cache_item_fb_public embedding table (MX4-safe dibit encoding, 32x2bit)
  - emb_cache_item_ec EmbeddingCollection initialization
  - KVZCHTBEConfig with OneFlow feature store enrichment
  - TBE SSD/cache params (prefetch_pipeline, rocksdb, l2_cache, etc.)
  - QComms fp8_quantize_dim=32
- Simplify forward path using build_embedding_cache_write_kjt from kvzch_utils
- Extract decode_dibits_to_int64 utility to kvzch_utils.py for reuse
- Update C++ encode/decode from FP8 nibbles (16x4bit) to MX4-safe dibits (32x2bit)

Reviewed By: xinzhang-nac

Differential Revision: D98399416
@EddyLXJ EddyLXJ force-pushed the export-D98399416 branch from 823d714 to 6bcef33 Compare April 7, 2026 20:57
EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Apr 7, 2026
Summary:

X-link: facebookresearch/FBGEMM#2519

CONTEXT: Port embedding cache changes from blue_reels_umia_v1_exp_hstu_simplified_v5_base_model.py to the oneflow base model. All embedding cache related configs are gated behind SID_INJECTION_MODE_V5 == PRETRAIN_MAP_EMBEDDING_CACHE so the base model remains unchanged when the mode is not active.

WHAT:
- Gate all embedding cache configs behind PRETRAIN_MAP_EMBEDDING_CACHE:
  - sparse_object_id_for_embedding_cache feature and replication config
  - zch_embedding_cache_item_fb_public embedding table (MX4-safe dibit encoding, 32x2bit)
  - emb_cache_item_ec EmbeddingCollection initialization
  - KVZCHTBEConfig with OneFlow feature store enrichment
  - TBE SSD/cache params (prefetch_pipeline, rocksdb, l2_cache, etc.)
  - QComms fp8_quantize_dim=32
- Simplify forward path using build_embedding_cache_write_kjt from kvzch_utils
- Extract decode_dibits_to_int64 utility to kvzch_utils.py for reuse
- Update C++ encode/decode from FP8 nibbles (16x4bit) to MX4-safe dibits (32x2bit)

Reviewed By: xinzhang-nac

Differential Revision: D98399416
@EddyLXJ EddyLXJ force-pushed the export-D98399416 branch 2 times, most recently from ef6ec98 to 8aeec3a Compare April 7, 2026 22:33
EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Apr 7, 2026
Summary:

X-link: facebookresearch/FBGEMM#2519

CONTEXT: Port embedding cache changes from blue_reels_umia_v1_exp_hstu_simplified_v5_base_model.py to the oneflow base model. All embedding cache related configs are gated behind SID_INJECTION_MODE_V5 == PRETRAIN_MAP_EMBEDDING_CACHE so the base model remains unchanged when the mode is not active.

WHAT:
- Gate all embedding cache configs behind PRETRAIN_MAP_EMBEDDING_CACHE:
  - sparse_object_id_for_embedding_cache feature and replication config
  - zch_embedding_cache_item_fb_public embedding table (MX4-safe dibit encoding, 32x2bit)
  - emb_cache_item_ec EmbeddingCollection initialization
  - KVZCHTBEConfig with OneFlow feature store enrichment
  - TBE SSD/cache params (prefetch_pipeline, rocksdb, l2_cache, etc.)
  - QComms fp8_quantize_dim=32
- Simplify forward path using build_embedding_cache_write_kjt from kvzch_utils
- Extract decode_dibits_to_int64 utility to kvzch_utils.py for reuse
- Update C++ encode/decode from FP8 nibbles (16x4bit) to MX4-safe dibits (32x2bit)

Reviewed By: xinzhang-nac

Differential Revision: D98399416
@EddyLXJ EddyLXJ force-pushed the export-D98399416 branch from 8aeec3a to f69e314 Compare April 7, 2026 22:35
Summary:
Pull Request resolved: pytorch#5552

X-link: https://github.com/facebookresearch/FBGEMM/pull/2519

CONTEXT: Port embedding cache changes from blue_reels_umia_v1_exp_hstu_simplified_v5_base_model.py to the oneflow base model. All embedding cache related configs are gated behind SID_INJECTION_MODE_V5 == PRETRAIN_MAP_EMBEDDING_CACHE so the base model remains unchanged when the mode is not active.

WHAT:
- Gate all embedding cache configs behind PRETRAIN_MAP_EMBEDDING_CACHE:
  - sparse_object_id_for_embedding_cache feature and replication config
  - zch_embedding_cache_item_fb_public embedding table (MX4-safe dibit encoding, 32x2bit)
  - emb_cache_item_ec EmbeddingCollection initialization
  - KVZCHTBEConfig with OneFlow feature store enrichment
  - TBE SSD/cache params (prefetch_pipeline, rocksdb, l2_cache, etc.)
  - QComms fp8_quantize_dim=32
- Simplify forward path using build_embedding_cache_write_kjt from kvzch_utils
- Extract decode_dibits_to_int64 utility to kvzch_utils.py for reuse
- Update C++ encode/decode from FP8 nibbles (16x4bit) to MX4-safe dibits (32x2bit)

Reviewed By: xinzhang-nac

Differential Revision: D98399416
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync bot commented Apr 8, 2026

This pull request has been merged in 6de075d.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant