-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Avoid unnecessary DisjunctionMaxBulkScorer overhead #15659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
56cf380 to
ecc05df
Compare
|
I realized that we cannot call ScorerSupplier.get() more than once. Need another solutions. |
a3b772a to
160a235
Compare
This change inspects clause bulk scorers up front and only uses DisjunctionMaxBulkScorer if at least one clause provides a non-default BulkScorer. Otherwise, we fall back to the scorer-based path. c88f933 made DisjunctionMaxQuery use DisjunctionMaxBulkScorer when tieBreakerMultiplier == 0 and scoreMode == TOP_SCORES. However, this bulk path primarily pays off when at least one clause implements a specialized BulkScorer. When all clauses return DefaultBulkScorer, the bulk windowing and replay logic adds overhead while preventing effective use of minCompetitiveScore and block-max optimizations that the scorer-based DisjunctionMaxScorer supports in TOP_SCORES mode. In such cases, falling back to the scorer-based path typically results in better performance and restores competitive-score-based skipping. Fixes: apache#15658 Related PR: apache#14040
160a235 to
6231d37
Compare
I updated the code not to call ScorerSupplier.get() more than once. |
|
Hi, it would be nice to not always force push, this makes reviewing hard. I have no idea if you have fixed the linter warnings. Squashing the commits is done on our side while merging, please don't do it yourself. Otherwise the change looks fine to me. but I leave it to @jpountz as he understand the code much better. |
|
Please fix the linter warning by running "./gradlew tidy" and commit, but don't sqash! |
It's true that the windowing+replay logic isn't free but I remember that it was still better than using a lucene/lucene/core/src/java/org/apache/lucene/search/DisjunctionMaxBulkScorer.java Line 81 in 7ebdb93
Let's benchmark this change with luceneutil and our dismax tasks (https://github.com/mikemccand/luceneutil/blob/main/tasks/wikinightly.tasks#L257-L276)? |
This change inspects clause bulk scorers up front and only uses DisjunctionMaxBulkScorer if at least one clause provides a non-default BulkScorer. Otherwise, we fall back to the scorer-based path.
c88f933 made DisjunctionMaxQuery use DisjunctionMaxBulkScorer when tieBreakerMultiplier == 0 and scoreMode == TOP_SCORES. However, this bulk path primarily pays off when at least one clause implements a specialized BulkScorer.
When all clauses return DefaultBulkScorer, the bulk windowing and replay logic adds overhead while preventing effective use of minCompetitiveScore and block-max optimizations that the scorer-based DisjunctionMaxScorer supports in TOP_SCORES mode.
In such cases, falling back to the scorer-based path typically results in better performance and restores competitive-score-based skipping.
Fixes: #15658
Related PR: #14040