Skip to content

feat: add hnsw-rabitq support#69

Open
egolearner wants to merge 37 commits intomainfrom
feat/rabitq
Open

feat: add hnsw-rabitq support#69
egolearner wants to merge 37 commits intomainfrom
feat/rabitq

Conversation

@egolearner
Copy link
Collaborator

resolve #42

@egolearner
Copy link
Collaborator Author

Depends on VectorDB-NTU/RaBitQ-Library#36

@egolearner egolearner marked this pull request as ready for review February 10, 2026 13:31
@egolearner egolearner changed the title feat: add hnsw-rabitq support WIP feat: add hnsw-rabitq support Feb 10, 2026
@Cuiyus
Copy link
Collaborator

Cuiyus commented Feb 27, 2026

@greptile

@greptile-apps
Copy link

greptile-apps bot commented Feb 27, 2026

Greptile Summary

This PR adds comprehensive HNSW-RaBitQ support to the zvec project, implementing a vector search index that combines Hierarchical Navigable Small World (HNSW) graph structure with RaBitQ quantization for memory-efficient approximate nearest neighbor search.

Major changes:

  • New HNSW-RaBitQ algorithm implementation with builder, searcher, and streamer components
  • RaBitQ vector quantization with configurable bits (default 7-bit) and k-means clustering
  • Integration with existing zvec framework including Python bindings and protocol buffers
  • Comprehensive test coverage at both C++ and Python levels
  • Added RaBitQ-Library as git submodule dependency

Issues found:

  • Variable-length arrays (VLAs) used in algorithm files are not standard C++ and will cause compilation failures on MSVC
  • Typo in .gitmodules with duplicate "thirdparty/" prefix in submodule name

Confidence Score: 3/5

  • This PR requires fixes for compilation issues before merging
  • The implementation is comprehensive with good test coverage, but contains critical syntax issues (VLAs) that will prevent compilation on MSVC and a configuration error in git submodules. Once these are fixed, the code appears well-structured
  • Pay close attention to src/core/algorithm/hnsw-rabitq/hnsw_rabitq_algorithm.cc and hnsw_rabitq_query_algorithm.cc which have VLA issues, and .gitmodules with the submodule name typo

Important Files Changed

Filename Overview
.gitmodules Added RaBitQ-Library submodule with typo in name (duplicate thirdparty/)
src/core/algorithm/hnsw-rabitq/hnsw_rabitq_algorithm.cc Implements HNSW graph algorithm with node insertion and search; contains multiple VLAs (non-standard C++)
src/core/algorithm/hnsw-rabitq/hnsw_rabitq_query_algorithm.cc Query algorithm implementation with VLA that needs replacement with std::vector
src/core/algorithm/hnsw-rabitq/hnsw_rabitq_builder.cc Builder implementation with proper parameter validation and initialization
src/core/algorithm/hnsw-rabitq/rabitq_converter.cc RaBitQ converter with k-means clustering for vector quantization training
src/include/zvec/db/index_params.h Added HnswRabitqIndexParams class with proper integration into type system
python/tests/test_collection_hnsw_rabitq.py Comprehensive Python tests for HNSW-RaBitQ collection operations
src/core/algorithm/hnsw-rabitq/hnsw_rabitq_searcher.cc Search implementation with proper parameter validation and reformer initialization

Class Diagram

%%{init: {'theme': 'neutral'}}%%
classDiagram
    class HnswRabitqIndex {
        +build()
        +search()
        +stream()
    }
    
    class HnswRabitqBuilder {
        +init()
        +train()
        +build()
        -rabitq_converter_
    }
    
    class HnswRabitqSearcher {
        +init()
        +search()
        -entity_
        -reformer_
    }
    
    class HnswRabitqStreamer {
        +init()
        +add()
        +search()
        -entity_
        -reformer_
    }
    
    class HnswRabitqAlgorithm {
        +add_node()
        +search()
        -entity_
    }
    
    class HnswRabitqEntity {
        +get_neighbors()
        +update_neighbors()
        -graph_structure_
    }
    
    class RabitqConverter {
        +train()
        +transform()
        +to_reformer()
        -rotator_
        -centroids_
    }
    
    class RabitqReformer {
        +reform()
        +get_quantized()
        -rotator_
        -centroids_
    }
    
    class HnswRabitqContext {
        +dist_calculator()
        +visit_filter()
    }
    
    HnswRabitqIndex --> HnswRabitqBuilder
    HnswRabitqIndex --> HnswRabitqSearcher
    HnswRabitqIndex --> HnswRabitqStreamer
    
    HnswRabitqBuilder --> RabitqConverter
    HnswRabitqBuilder --> HnswRabitqAlgorithm
    HnswRabitqBuilder --> HnswRabitqEntity
    
    HnswRabitqSearcher --> HnswRabitqAlgorithm
    HnswRabitqSearcher --> HnswRabitqEntity
    HnswRabitqSearcher --> RabitqReformer
    
    HnswRabitqStreamer --> HnswRabitqAlgorithm
    HnswRabitqStreamer --> HnswRabitqEntity
    HnswRabitqStreamer --> RabitqReformer
    
    HnswRabitqAlgorithm --> HnswRabitqEntity
    HnswRabitqAlgorithm --> HnswRabitqContext
    
    RabitqConverter --> RabitqReformer
Loading

Last reviewed commit: 42a30d6

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

93 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

path = thirdparty/magic_enum/magic_enum-0.9.7
url = https://github.com/Neargye/magic_enum.git
ignore = all
[submodule "thirdparty/thirdparty/RaBitQ-Library/RaBitQ-Library-0.1"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submodule name has duplicate thirdparty/ prefix

Suggested change
[submodule "thirdparty/thirdparty/RaBitQ-Library/RaBitQ-Library-0.1"]
[submodule "thirdparty/RaBitQ-Library/RaBitQ-Library-0.1"]

Comment on lines +139 to +140
float dists[size];
const void *neighbor_vecs[size];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable-length arrays (VLAs) are not standard C++ and won't compile with MSVC. Use std::vector instead

Suggested change
float dists[size];
const void *neighbor_vecs[size];
std::vector<float> dists(size);
std::vector<const void*> neighbor_vecs(size);

(*ctx->mutable_stats_get_neighbors())++;
}

node_id_t neighbor_ids[neighbors.size()];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VLA not standard C++, won't compile with MSVC

Suggested change
node_id_t neighbor_ids[neighbors.size()];
std::vector<node_id_t> neighbor_ids(neighbors.size());

Comment on lines +256 to +257
float dists[size];
const void *neighbor_vecs[size];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VLAs not standard C++, won't compile with MSVC

Suggested change
float dists[size];
const void *neighbor_vecs[size];
std::vector<float> dists(size);
std::vector<const void*> neighbor_vecs(size);

(*ctx->mutable_stats_get_neighbors())++;
}

node_id_t neighbor_ids[neighbors.size()];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VLA not standard C++, won't compile with MSVC

Suggested change
node_id_t neighbor_ids[neighbors.size()];
std::vector<node_id_t> neighbor_ids(neighbors.size());

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Integrate RaBitQ Quantization into HNSW Index

6 participants