Skip to content

Conversation

@benwtrent
Copy link
Member

This adds bulk scoring to diversity check. While this means that diversity check cannot exit super early (e.g. if it only needs to check 2 docs), I continually see diversity check as being the most expensive part of HNSW graph merging.

This tells me that typically, it isn't just one doc that is checked.

I ran 1M 768 cohere, force-merging with 4 threads.

baseline: 128.01
candidate: 92.15

float neighborSimilarity = scorer.score(neighbors.nodes()[i]);
if (neighborSimilarity >= score) {
return false;
bulkScoreNodes[bulkCount++] = neighbors.nodes()[i];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this more, doing 8 always, even if there are only 8 connections seems foolish. I will benchmark with Math.min((neighbors.nodes().length + 1)/2, bulkScoreNodes.length)).

@benwtrent benwtrent removed this from the 10.4.0 milestone Jan 27, 2026
Copy link
Contributor

@kaivalnp kaivalnp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a nice optimization!

private boolean diversityCheck(float score, NeighborArray neighbors, RandomVectorScorer scorer)
throws IOException {
int bulkCount = 0;
final int bulkScoreChunk = Math.min((neighbors.nodes().length + 1) / 2, bulkScoreNodes.length);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC neighbors.nodes() returns the internal array used to store neighbor nodes, which can be larger than actual number of neighbors and is exponentially growing -- can we have edge cases where the length is about twice the actual number of neighbors, causing us to bulk score everything?

I wonder if we should use neighbors.size() instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call! And good call on the array copy!, Just needed to handle the tail. New benchmarks show even nicer perf ;)

Comment on lines +70 to +71
private final int[] bulkScoreNodes; // for bulk scoring
private final float[] bulkScores; // for bulk scoring
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worried about thread-safety of these arrays (given we have concurrent merging), but from this comment it looks like instances of this class are not shared across threads, but rather multiple instances of this class (across different threads) can operate on a single HnswGraph?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaivalnp the scorer itself isn't threadsafe. I assumed that since we were using a scorer, we were OK.

I had the same threading concerns and looked up and it seems that each thread as a unique builder object instance (The worker of the thread) and they all work on the same graph.

throws IOException {
int bulkCount = 0;
final int bulkScoreChunk = Math.min((neighbors.nodes().length + 1) / 2, bulkScoreNodes.length);
for (int i = 0; i < neighbors.size(); i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we don't need to perform any operation per-node of this loop, can we reduce some iterations using something like:

private boolean diversityCheck(float score, NeighborArray neighbors, RandomVectorScorer scorer)
    throws IOException {
  final int chunk = Math.min(Math.ceilDiv(neighbors.size(), 2), bulkScoreNodes.length);
  for (int start = 0; start < neighbors.size(); start += chunk) {
    int length = Math.min(neighbors.size() - start, chunk);
    System.arraycopy(neighbors.nodes(), start, bulkScoreNodes, 0, length);
    if (scorer.bulkScore(bulkScoreNodes, bulkScores, length) >= score) {
      return false;
    }
  }
  return true;
}

@github-actions github-actions bot added this to the 10.4.0 milestone Jan 29, 2026
Copy link
Contributor

@kaivalnp kaivalnp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @benwtrent! Curious if you were able to run knnPerfTest?

Comment on lines 488 to 494
// handle a tail
if (scored < neighbors.size()) {
int chunkSize = neighbors.size() - scored;
System.arraycopy(neighbors.nodes(), scored, bulkScoreNodes, 0, chunkSize);
if (scorer.bulkScore(bulkScoreNodes, bulkScores, chunkSize) >= score) {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this tail -- we're doing Math.min(bulkScoreChunk, neighbors.size() - scored) in the above loop, which automatically bulk-scores the tail (using the second value)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good call 🤦 its reflex.

Comment on lines 163 to 164
this.bulkScoreNodes = new int[8];
this.bulkScores = new float[8];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe host 8 up as a private static final variable?

@benwtrent
Copy link
Member Author

@kaivalnp I did run with 768 dim vectors (see PR description).

I ran it again and I get perf around 77.04s for force merge to 90s force merge. Force merge performance is pretty variable, but with all my runs, this provides a measurable speed improvement.

Copy link
Contributor

@kaivalnp kaivalnp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants