Add Roaring64NavigableMap support for bitmap filtering on long fields#20598
Add Roaring64NavigableMap support for bitmap filtering on long fields#20598pdeaudney wants to merge 2 commits intoopensearch-project:mainfrom
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR introduces 64-bit bitmap filtering support for long fields via Roaring64NavigableMap. It adds two new query classes, extends NumberFieldMapper to support bitmap queries for numeric types, includes integration and unit tests, and updates test specifications and documentation. Changes
Sequence DiagramsequenceDiagram
participant Client as Client/Search API
participant Mapper as NumberFieldMapper
participant QueryType as Bitmap64IndexQuery/<br/>Bitmap64DocValuesQuery
participant Index as BKD Index/<br/>Doc Values
participant Bitmap as Roaring64NavigableMap
Client->>Mapper: Send bitmap query request
Mapper->>Mapper: Deserialize bitmap (portable 8-byte format)
Mapper->>QueryType: Create query instance
QueryType->>Bitmap: Initialize with bitmap values
QueryType->>Index: Intersect bitmap values with<br/>indexed/stored values
Index->>Index: Scan BKD points or<br/>iterate doc values
Index->>Bitmap: Check value membership
Bitmap-->>Index: Return containment result
Index-->>QueryType: Yield matching documents
QueryType-->>Client: Return filtered results
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Extend bitmap-based term filtering to long fields using 64-bit roaring bitmaps with portable serialization. This mirrors the existing 32-bit RoaringBitmap support for integer fields. Key changes: - Add Bitmap64IndexQuery (BKD tree merge-join for long point values) - Add Bitmap64DocValuesQuery (doc values filtering for long fields) - Add LONG.bitmapQuery() override in NumberFieldMapper - Graceful cross-format fallback: long fields accept 32-bit bitmaps (up-convert) and integer fields accept 64-bit bitmaps (down-convert with range validation) - Unit tests, integration tests, and REST API YAML tests Signed-off-by: Paul De Audney <pdeaudney@gmail.com> Signed-off-by: Paul De Audney <paul.deaudney@safetyculture.io>
Update changelog with PR number. Signed-off-by: Paul De Audney <pdeaudney@gmail.com> Signed-off-by: Paul De Audney <paul.deaudney@safetyculture.io>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@CHANGELOG.md`:
- Line 8: Update the changelog entry that currently contains the placeholder PR
number by replacing
([`#XXXXX`](https://github.com/opensearch-project/OpenSearch/pull/XXXXX)) with the
actual pull request number and URL for the change described in the line "Add
Roaring64NavigableMap support for bitmap filtering on long fields"; ensure the
link text and href both use the real PR number (e.g.,
[`#12345`](https://github.com/opensearch-project/OpenSearch/pull/12345)).
🧹 Nitpick comments (9)
server/src/test/java/org/opensearch/search/query/Bitmap64DocValuesQueryTests.java (2)
33-50: Test coverage is limited compared toBitmap64IndexQueryTests.
Bitmap64IndexQueryTestsincludes tests for negative values, boundary values (Long.MIN_VALUE/Long.MAX_VALUE), empty bitmap rewrite, argument validation, and random documents. This suite only has three tests. Consider adding at least a negative-values test and a boundary-values test to ensureBitmap64DocValuesQueryhandles the full signed long range correctly via the doc-values path — the two-phase iterator'smin/maxoptimization andbitmap.contains()behavior may differ from the BKD intersection path.
99-114:getMatchingValueshardcodes"product_id"and lacks null-check on scorer.The static helper imported from
Bitmap64IndexQueryTestshardcodes the field name"product_id"(Line 102 of that file) and does not guard againstweight.scorer(leaf)returningnull. This is acceptable for the current tests but will NPE if a leaf has no matching docs. Consider parameterizing the field name and adding a null guard for broader reuse.server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java (3)
91-129: BytesRef reuse inBitmap64Iteratoris safe but worth a brief inline note.
next()overwrites the sameencodedbacking array on every call. This is safe here becausenextQueryPointis compared (inmatches/compare) beforenext()is called again. However, callers who store the returnedBytesReffor later use would get stale data — consider adding a one-line comment on Line 94 (e.g.,// reused; callers must not hold a reference across calls) to prevent future misuse.
147-166:MergePointVisitorandDocIdSetBuilderare eagerly allocated inscorerSupplier().The
resultDocIdSetBuilder andMergePointVisitorare instantiated whenscorerSupplier()is called (Lines 150-151), but only used insideget(leadCost)(Line 155). If the scorer is neverget()-ed (e.g., because the query planner decides to skip this segment), the allocation is wasted. Consider moving these intoget(). This mirrors how some Lucene built-in queries defer heavy work toget().
159-165: Cost estimatecardinality * 20— may overflow for very large bitmaps.If
cardinalityis large (e.g., > ~4.6 × 10^17),cardinality * 20overflowslong. While this is unlikely in practice, defensive coding would cap it:Suggested defensive change
`@Override` public long cost() { if (cost == -1) { - cost = cardinality * 20; + cost = Math.min(cardinality, Long.MAX_VALUE / 20) * 20; } return cost; }server/src/main/java/org/opensearch/search/query/Bitmap64DocValuesQuery.java (1)
66-120:scorerSupplier()never returnsnull, even when no documents have values for the field.
DocValues.getSortedNumeric()always returns a non-null (possibly empty) iterator, so this will always create aTwoPhaseIteratorthat simply never matches. This is safe and consistent with the 32-bitBitmapDocValuesQuery, but returningnullfromscorerSupplier()when the field has no doc values in the segment would let the query planner skip the segment entirely.server/src/test/java/org/opensearch/search/query/Bitmap64IndexQueryTests.java (3)
277-288:testCheckArgsNullBitmapandtestCheckArgsWithNullBitmapare duplicates.Both test the same code path:
checkArgs(<non-null field>, null). One of them can be removed.Suggested removal
public void testCheckArgsNullBitmap() { assertThrows(IllegalArgumentException.class, () -> Bitmap64IndexQuery.checkArgs("field", null)); } - public void testCheckArgsNullField() { - Roaring64NavigableMap bitmap = new Roaring64NavigableMap(true); - assertThrows(IllegalArgumentException.class, () -> Bitmap64IndexQuery.checkArgs(null, bitmap)); - } - - public void testCheckArgsWithNullBitmap() { - assertThrows(IllegalArgumentException.class, () -> { Bitmap64IndexQuery.checkArgs("product_id", null); }); - } + public void testCheckArgsNullField() { + Roaring64NavigableMap bitmap = new Roaring64NavigableMap(true); + assertThrows(IllegalArgumentException.class, () -> Bitmap64IndexQuery.checkArgs(null, bitmap)); + }
99-114:getMatchingValueshardcodes field name and doesn't guard against null scorer.This public static helper is reused by
Bitmap64DocValuesQueryTests. Two minor concerns:
- The field name
"product_id"is hardcoded (Line 102). If any test uses a different field name, the helper will silently return wrong values.weight.scorer(leaf)could returnnullfor segments without matching points, which would NPE on Line 104. Current tests don't hit this, but it's fragile for reuse.Suggested improvement
- static List<Long> getMatchingValues(Weight weight, IndexReader reader) throws IOException { + static List<Long> getMatchingValues(Weight weight, IndexReader reader, String field) throws IOException { List<Long> actual = new LinkedList<>(); for (LeafReaderContext leaf : reader.leaves()) { - SortedNumericDocValues dv = DocValues.getSortedNumeric(leaf.reader(), "product_id"); + SortedNumericDocValues dv = DocValues.getSortedNumeric(leaf.reader(), field); Scorer scorer = weight.scorer(leaf); + if (scorer == null) { + continue; + } DocIdSetIterator disi = scorer.iterator();
149-184: Random test could produce fewer than 5 query values due toHashSetdedup.
queryValuesis aHashSetwith up to 5 random values, but duplicates are silently absorbed. The test still works (it's just less random than it could be), but if the intent is "5 distinct values," the loop should repeat until the set has 5 elements. This is a nitpick — the test is still valid.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
CHANGELOG.mdrest-api-spec/src/main/resources/rest-api-spec/test/search/380_bitmap_filtering.ymlserver/src/internalClusterTest/java/org/opensearch/search/query/SearchQueryIT.javaserver/src/main/java/org/opensearch/index/mapper/NumberFieldMapper.javaserver/src/main/java/org/opensearch/search/query/Bitmap64DocValuesQuery.javaserver/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.javaserver/src/test/java/org/opensearch/index/mapper/NumberFieldTypeTests.javaserver/src/test/java/org/opensearch/search/query/Bitmap64DocValuesQueryTests.javaserver/src/test/java/org/opensearch/search/query/Bitmap64IndexQueryTests.java
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-01-13T17:40:27.167Z
Learnt from: reta
Repo: opensearch-project/OpenSearch PR: 20411
File: server/src/main/java/org/opensearch/index/codec/CodecService.java:112-133
Timestamp: 2026-01-13T17:40:27.167Z
Learning: Avoid capturing or evaluating a supplier (e.g., this::defaultCodec) upfront when passing it to a registry during object construction. If registries may replace defaults during iteration (as in EnginePlugin.getAdditionalCodecs), pass the supplier itself and only resolve it at use time. This ensures dynamic behavior is preserved during initialization and prevents premature binding of defaults in codecs/registry setup. This pattern should apply to similar initialization paths in Java server code where registries may mutate defaults during construction.
Applied to files:
server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.javaserver/src/main/java/org/opensearch/search/query/Bitmap64DocValuesQuery.javaserver/src/main/java/org/opensearch/index/mapper/NumberFieldMapper.java
📚 Learning: 2025-12-02T22:44:14.799Z
Learnt from: prudhvigodithi
Repo: opensearch-project/OpenSearch PR: 20112
File: server/src/internalClusterTest/java/org/opensearch/search/slice/SearchSliceIT.java:73-81
Timestamp: 2025-12-02T22:44:14.799Z
Learning: In OpenSearch integration tests extending OpenSearchIntegTestCase, using `LuceneTestCase.SuppressCodecs("*")` triggers special handling that selects a random production codec from the CODECS array, while `SuppressCodecs("Asserting")` or other specific codec suppressions still allow Lucene's default codec randomization which may include the asserting codec. Use `SuppressCodecs("*")` when you need to completely avoid asserting codecs (e.g., for cross-thread StoredFieldsReader usage) while maintaining production codec test coverage.
Applied to files:
server/src/internalClusterTest/java/org/opensearch/search/query/SearchQueryIT.java
🧬 Code graph analysis (5)
server/src/test/java/org/opensearch/search/query/Bitmap64IndexQueryTests.java (1)
server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java (1)
Bitmap64IndexQuery(63-298)
server/src/test/java/org/opensearch/index/mapper/NumberFieldTypeTests.java (2)
server/src/main/java/org/opensearch/search/query/Bitmap64DocValuesQuery.java (1)
Bitmap64DocValuesQuery(46-161)server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java (1)
Bitmap64IndexQuery(63-298)
server/src/internalClusterTest/java/org/opensearch/search/query/SearchQueryIT.java (1)
server/src/main/java/org/opensearch/index/query/TermsQueryBuilder.java (1)
TermsQueryBuilder(89-771)
server/src/main/java/org/opensearch/search/query/Bitmap64DocValuesQuery.java (2)
server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java (1)
Bitmap64IndexQuery(63-298)server/src/main/java/org/opensearch/search/query/BitmapDocValuesQuery.java (1)
BitmapDocValuesQuery(41-156)
server/src/main/java/org/opensearch/index/mapper/NumberFieldMapper.java (2)
server/src/main/java/org/opensearch/search/query/Bitmap64DocValuesQuery.java (1)
Bitmap64DocValuesQuery(46-161)server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java (1)
Bitmap64IndexQuery(63-298)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: check-files
- GitHub Check: Mend Security Check
🔇 Additional comments (19)
server/src/internalClusterTest/java/org/opensearch/search/query/SearchQueryIT.java (3)
87-108: LGTM.
1203-1234: Good coverage for long bitmap terms query.
1236-1269: Large-value bitmap scenario covered.rest-api-spec/src/main/resources/rest-api-spec/test/search/380_bitmap_filtering.yml (1)
185-295: Nice addition to long-field bitmap coverage.server/src/test/java/org/opensearch/index/mapper/NumberFieldTypeTests.java (3)
80-101: LGTM.
1026-1033: Good adjustment to the randomized type selection.
1035-1060: Long bitmap query coverage looks solid.server/src/main/java/org/opensearch/index/mapper/NumberFieldMapper.java (4)
77-101: LGTM.
1166-1201: Good cross-format bitmap handling for integer fields.
1466-1510: Long bitmap query override looks good.
1517-1519: Clear UNSIGNED_LONG guidance.server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java (4)
39-62: Thorough Javadoc — well documented.The documentation clearly explains the signed-mode requirement, cross-language interoperability considerations, and serialization format. This is valuable context for maintainers.
118-127:advancedoes not skip to or pasttargetVal— only past values strictly less than it.The contract of
advanceis documented as "advance as long as the next value is smaller than target." After the loop,buffered >= targetVal(or exhausted). This is correct for the merge-join: the subsequentnext()inmatches()will return the first value ≥ target. Just confirming the signed comparisonbuffered < targetValis sound here because both sides are decoded signed longs and the bitmap iterates in signed order (signedLongs=true).
219-232:matches()does not advance past a matched value — intentional for duplicate doc values.After returning
trueon an exact match (cmp == 0),nextQueryPointstill points to the matched encoded value. This is correct: multiple documents can share the same BKD point value, and subsequent calls tomatches()with that same value will match again. The iterator only advances when the current query point falls behind (cmp < 0).
234-256:compare()may returnCELL_INSIDE_QUERYonly when the cell is a single point — verify BKD invariant.
CELL_INSIDE_QUERY(Line 249) is returned only whencmpMin == 0 && cmpMax == 0, meaningmin == max == queryPoint. This is correct: if the cell's min and max are identical, all documents in that cell hold that single value, and it matches the bitmap. For cells with a range of values,CELL_CROSSES_QUERYis returned, which is conservative but correct.One thing to be aware of: this doesn't attempt to detect the case where
min != maxbut every value in the cell's range is present in the bitmap — that's a possible optimization but not a correctness issue.server/src/main/java/org/opensearch/search/query/Bitmap64DocValuesQuery.java (3)
46-64: Clean constructor with min/max optimization — consistent with 32-bitBitmapDocValuesQuery.The min/max bounds computed from
bitmap.first()/bitmap.last()provide an efficient range pre-check before the heavierbitmap.contains()call. This requires signed mode (signedLongs=true) so thatfirst()/last()return correct signed bounds — which is documented in the class Javadoc.
87-110: Multi-value path relies on ascending doc-value order for early exit — correct.
SortedNumericDocValuesguarantees values are returned in ascending order within a document, so thevalue > max → return falseshort-circuit on Line 96-97 is valid: once a value exceeds the bitmap's maximum, no subsequent value can match.
112-112:DefaultScorerSupplieris a legitimate Apache Lucene utility class.The class
org.apache.lucene.search.Weight.DefaultScorerSupplieris an internal Lucene wrapper that implementsScorerSupplierand provides the expected semantics:get()returns the wrapped scorer, andcost()is delegated appropriately. The usage in this code is correct and follows standard Lucene patterns.server/src/test/java/org/opensearch/search/query/Bitmap64IndexQueryTests.java (1)
66-97: Solid test coverage across value ranges and edge cases.The test suite covers basic matching, multi-value documents, large values beyond 32-bit range, negative longs, Long.MIN_VALUE/MAX_VALUE boundaries, random fuzz, argument validation, rewrite semantics, and BKD point-visitor integration. This is thorough for a new query type.
Also applies to: 116-147, 186-217, 219-246, 248-275, 298-317
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
CHANGELOG.md
Outdated
|
|
||
| ## [Unreleased 3.x] | ||
| ### Added | ||
| - Add Roaring64NavigableMap support for bitmap filtering on long fields ([#XXXXX](https://github.com/opensearch-project/OpenSearch/pull/XXXXX)) |
There was a problem hiding this comment.
Replace placeholder PR number.
The changelog entry should link to the actual PR.
📝 Suggested update
-- Add Roaring64NavigableMap support for bitmap filtering on long fields ([`#XXXXX`](https://github.com/opensearch-project/OpenSearch/pull/XXXXX))
+- Add Roaring64NavigableMap support for bitmap filtering on long fields ([`#20598`](https://github.com/opensearch-project/OpenSearch/pull/20598))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - Add Roaring64NavigableMap support for bitmap filtering on long fields ([#XXXXX](https://github.com/opensearch-project/OpenSearch/pull/XXXXX)) | |
| - Add Roaring64NavigableMap support for bitmap filtering on long fields ([`#20598`](https://github.com/opensearch-project/OpenSearch/pull/20598)) |
🤖 Prompt for AI Agents
In `@CHANGELOG.md` at line 8, Update the changelog entry that currently contains
the placeholder PR number by replacing
([`#XXXXX`](https://github.com/opensearch-project/OpenSearch/pull/XXXXX)) with the
actual pull request number and URL for the change described in the line "Add
Roaring64NavigableMap support for bitmap filtering on long fields"; ensure the
link text and href both use the real PR number (e.g.,
[`#12345`](https://github.com/opensearch-project/OpenSearch/pull/12345)).
|
❌ Gradle check result for 7acd66b: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
Extend bitmap-based term filtering to long fields using 64-bit roaring bitmaps with portable serialization. This mirrors the existing 32-bit RoaringBitmap support for integer fields.
Key changes:
Signed-off-by: Paul De Audney pdeaudney@gmail.com
Related Issues
Resolves #[20597]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.