Skip to content

Conversation

@chenghao-guo
Copy link
Contributor

@chenghao-guo chenghao-guo commented Oct 31, 2025

close #4723

Key Changes:
The distributed index creation leverages the existing IVF framework while adding coordination mechanisms for multi-node execution. The index merger component now handles distributed fragment consolidation and metadata synchronization. This work enables scalable vector index creation for large-scale datasets, significantly reducing index build time.

  • Implemented distributed IVF index building infrastructure for parallel index construction across multiple nodes
  • Enhanced the index merger component for distributed operations
  • For IVF_HNSW part, the HNSW graph is built locally within the shard as a sub-index of the partition; there is no cross-shard graph merging and no cross-shard edges. These are supported but distribution only happens in IVF.
  • CPU only, torch accelerator will not be supported and fall back to single node IVF index creation.

Current Status in this PR:
• FLAT/SQ: Should work now, it is under active testing phase, validating distributed performance and accuracy
• PQ (Product Quantization): Currently depends on global training codebook, requiring centralized training before distributed deployment.
• RQ (Residual Quantization): I didn't consider this when I design this PR. Not yet supported in distributed mode maybe, planned for future implementation

Once I finish all the testing phase in my side on performance and recall accuracy, I will mark it ready to review.

@github-actions github-actions bot added enhancement New feature or request python labels Oct 31, 2025
@chenghao-guo chenghao-guo force-pushed the ivf_distribute_builder branch 8 times, most recently from fd9c15e to d571f55 Compare November 10, 2025 06:46
@chenghao-guo chenghao-guo force-pushed the ivf_distribute_builder branch 9 times, most recently from e544b47 to dc8d4bf Compare November 13, 2025 07:00
@chenghao-guo chenghao-guo force-pushed the ivf_distribute_builder branch from 8ee88df to e0dbea4 Compare November 17, 2025 09:29
@codecov-commenter
Copy link

codecov-commenter commented Nov 17, 2025

@chenghao-guo chenghao-guo force-pushed the ivf_distribute_builder branch 3 times, most recently from b591569 to d627974 Compare November 20, 2025 11:09
@yanghua yanghua force-pushed the ivf_distribute_builder branch 3 times, most recently from 6f08001 to c37bbb8 Compare November 25, 2025 02:27
@chenghao-guo chenghao-guo force-pushed the ivf_distribute_builder branch 2 times, most recently from e3d15f3 to 204d3f0 Compare November 28, 2025 10:25
@chenghao-guo chenghao-guo force-pushed the ivf_distribute_builder branch from b12b3e4 to af8249b Compare January 4, 2026 02:57
@yanghua yanghua merged commit 08e3360 into lance-format:main Jan 4, 2026
28 of 29 checks passed
@yanghua
Copy link
Collaborator

yanghua commented Jan 4, 2026

@BubbleCal Thanks for reviewing! I have filed some tickets to track further work: #5621 and #5622.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

support build IVF_FLAT/PQ/SQ vector index distributedly

4 participants