Skip to content

Add Roaring64NavigableMap support for bitmap filtering on long fields#20598

Open
pdeaudney wants to merge 2 commits intoopensearch-project:mainfrom
pdeaudney:main
Open

Add Roaring64NavigableMap support for bitmap filtering on long fields#20598
pdeaudney wants to merge 2 commits intoopensearch-project:mainfrom
pdeaudney:main

Conversation

@pdeaudney
Copy link

@pdeaudney pdeaudney commented Feb 11, 2026

Description

Extend bitmap-based term filtering to long fields using 64-bit roaring bitmaps with portable serialization. This mirrors the existing 32-bit RoaringBitmap support for integer fields.

Key changes:

  • Add Bitmap64IndexQuery (BKD tree merge-join for long point values)
  • Add Bitmap64DocValuesQuery (doc values filtering for long fields)
  • Add LONG.bitmapQuery() override in NumberFieldMapper
  • Graceful cross-format fallback: long fields accept 32-bit bitmaps (up-convert) and integer fields accept 64-bit bitmaps (down-convert with range validation)
  • Unit tests, integration tests, and REST API YAML tests

Signed-off-by: Paul De Audney pdeaudney@gmail.com

Related Issues

Resolves #[20597]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 11, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR introduces 64-bit bitmap filtering support for long fields via Roaring64NavigableMap. It adds two new query classes, extends NumberFieldMapper to support bitmap queries for numeric types, includes integration and unit tests, and updates test specifications and documentation.

Changes

Cohort / File(s) Summary
Documentation
CHANGELOG.md
Added entry for Roaring64NavigableMap bitmap filtering support on long fields.
REST API Tests
rest-api-spec/src/main/resources/rest-api-spec/test/search/380_bitmap_filtering.yml
Added two new test scenarios for bitmap filtering on long fields: terms query and terms lookup with binary field, including version constraints and test fixtures.
Integration Tests
server/src/internalClusterTest/java/org/opensearch/search/query/SearchQueryIT.java
Added two test methods (testTermsQueryWithBitmapLongField, testTermsQueryWithBitmapLongFieldLargeValues) for end-to-end bitmap query validation on long fields with standard and large values.
Query Implementation
server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java, server/src/main/java/org/opensearch/search/query/Bitmap64DocValuesQuery.java
Introduced two new query classes for 64-bit bitmap filtering: Bitmap64IndexQuery integrates with Lucene BKD point values for indexed long fields, while Bitmap64DocValuesQuery uses doc values with TwoPhaseIterator and singleton optimization.
Field Mapper Updates
server/src/main/java/org/opensearch/index/mapper/NumberFieldMapper.java
Added bitmapQuery method overrides for INTEGER and LONG types, enabling portable serialization of Roaring64NavigableMap with range checks and fallback support for 32-bit values.
Unit Tests
server/src/test/java/org/opensearch/index/mapper/NumberFieldTypeTests.java, server/src/test/java/org/opensearch/search/query/Bitmap64IndexQueryTests.java, server/src/test/java/org/opensearch/search/query/Bitmap64DocValuesQueryTests.java
Added test coverage for long bitmap queries in field type tests, comprehensive unit tests for Bitmap64IndexQuery (11 test methods including edge cases and argument validation), and unit tests for Bitmap64DocValuesQuery (3 test methods covering basic, multi-value, and large-value scenarios).

Sequence Diagram

sequenceDiagram
    participant Client as Client/Search API
    participant Mapper as NumberFieldMapper
    participant QueryType as Bitmap64IndexQuery/<br/>Bitmap64DocValuesQuery
    participant Index as BKD Index/<br/>Doc Values
    participant Bitmap as Roaring64NavigableMap
    
    Client->>Mapper: Send bitmap query request
    Mapper->>Mapper: Deserialize bitmap (portable 8-byte format)
    Mapper->>QueryType: Create query instance
    QueryType->>Bitmap: Initialize with bitmap values
    QueryType->>Index: Intersect bitmap values with<br/>indexed/stored values
    Index->>Index: Scan BKD points or<br/>iterate doc values
    Index->>Bitmap: Check value membership
    Bitmap-->>Index: Return containment result
    Index-->>QueryType: Yield matching documents
    QueryType-->>Client: Return filtered results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

Search:Performance, enhancement, Indexing & Search

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 4.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding 64-bit roaring bitmap support for long fields, which is the primary objective of the PR.
Description check ✅ Passed The description covers all key aspects: what was changed (Bitmap64IndexQuery, Bitmap64DocValuesQuery, LONG.bitmapQuery override), cross-format fallback behavior, test coverage, related issue reference, and proper DCO sign-off.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Extend bitmap-based term filtering to long fields using 64-bit roaring
bitmaps with portable serialization. This mirrors the existing 32-bit
RoaringBitmap support for integer fields.

Key changes:
- Add Bitmap64IndexQuery (BKD tree merge-join for long point values)
- Add Bitmap64DocValuesQuery (doc values filtering for long fields)
- Add LONG.bitmapQuery() override in NumberFieldMapper
- Graceful cross-format fallback: long fields accept 32-bit bitmaps
  (up-convert) and integer fields accept 64-bit bitmaps (down-convert
  with range validation)
- Unit tests, integration tests, and REST API YAML tests

Signed-off-by: Paul De Audney <pdeaudney@gmail.com>
Signed-off-by: Paul De Audney <paul.deaudney@safetyculture.io>
Update changelog with PR number.

Signed-off-by: Paul De Audney <pdeaudney@gmail.com>
Signed-off-by: Paul De Audney <paul.deaudney@safetyculture.io>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@CHANGELOG.md`:
- Line 8: Update the changelog entry that currently contains the placeholder PR
number by replacing
([`#XXXXX`](https://github.com/opensearch-project/OpenSearch/pull/XXXXX)) with the
actual pull request number and URL for the change described in the line "Add
Roaring64NavigableMap support for bitmap filtering on long fields"; ensure the
link text and href both use the real PR number (e.g.,
[`#12345`](https://github.com/opensearch-project/OpenSearch/pull/12345)).
🧹 Nitpick comments (9)
server/src/test/java/org/opensearch/search/query/Bitmap64DocValuesQueryTests.java (2)

33-50: Test coverage is limited compared to Bitmap64IndexQueryTests.

Bitmap64IndexQueryTests includes tests for negative values, boundary values (Long.MIN_VALUE/Long.MAX_VALUE), empty bitmap rewrite, argument validation, and random documents. This suite only has three tests. Consider adding at least a negative-values test and a boundary-values test to ensure Bitmap64DocValuesQuery handles the full signed long range correctly via the doc-values path — the two-phase iterator's min/max optimization and bitmap.contains() behavior may differ from the BKD intersection path.


99-114: getMatchingValues hardcodes "product_id" and lacks null-check on scorer.

The static helper imported from Bitmap64IndexQueryTests hardcodes the field name "product_id" (Line 102 of that file) and does not guard against weight.scorer(leaf) returning null. This is acceptable for the current tests but will NPE if a leaf has no matching docs. Consider parameterizing the field name and adding a null guard for broader reuse.

server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java (3)

91-129: BytesRef reuse in Bitmap64Iterator is safe but worth a brief inline note.

next() overwrites the same encoded backing array on every call. This is safe here because nextQueryPoint is compared (in matches / compare) before next() is called again. However, callers who store the returned BytesRef for later use would get stale data — consider adding a one-line comment on Line 94 (e.g., // reused; callers must not hold a reference across calls) to prevent future misuse.


147-166: MergePointVisitor and DocIdSetBuilder are eagerly allocated in scorerSupplier().

The result DocIdSetBuilder and MergePointVisitor are instantiated when scorerSupplier() is called (Lines 150-151), but only used inside get(leadCost) (Line 155). If the scorer is never get()-ed (e.g., because the query planner decides to skip this segment), the allocation is wasted. Consider moving these into get(). This mirrors how some Lucene built-in queries defer heavy work to get().


159-165: Cost estimate cardinality * 20 — may overflow for very large bitmaps.

If cardinality is large (e.g., > ~4.6 × 10^17), cardinality * 20 overflows long. While this is unlikely in practice, defensive coding would cap it:

Suggested defensive change
                     `@Override`
                     public long cost() {
                         if (cost == -1) {
-                            cost = cardinality * 20;
+                            cost = Math.min(cardinality, Long.MAX_VALUE / 20) * 20;
                         }
                         return cost;
                     }
server/src/main/java/org/opensearch/search/query/Bitmap64DocValuesQuery.java (1)

66-120: scorerSupplier() never returns null, even when no documents have values for the field.

DocValues.getSortedNumeric() always returns a non-null (possibly empty) iterator, so this will always create a TwoPhaseIterator that simply never matches. This is safe and consistent with the 32-bit BitmapDocValuesQuery, but returning null from scorerSupplier() when the field has no doc values in the segment would let the query planner skip the segment entirely.

server/src/test/java/org/opensearch/search/query/Bitmap64IndexQueryTests.java (3)

277-288: testCheckArgsNullBitmap and testCheckArgsWithNullBitmap are duplicates.

Both test the same code path: checkArgs(<non-null field>, null). One of them can be removed.

Suggested removal
     public void testCheckArgsNullBitmap() {
         assertThrows(IllegalArgumentException.class, () -> Bitmap64IndexQuery.checkArgs("field", null));
     }

-    public void testCheckArgsNullField() {
-        Roaring64NavigableMap bitmap = new Roaring64NavigableMap(true);
-        assertThrows(IllegalArgumentException.class, () -> Bitmap64IndexQuery.checkArgs(null, bitmap));
-    }
-
-    public void testCheckArgsWithNullBitmap() {
-        assertThrows(IllegalArgumentException.class, () -> { Bitmap64IndexQuery.checkArgs("product_id", null); });
-    }
+    public void testCheckArgsNullField() {
+        Roaring64NavigableMap bitmap = new Roaring64NavigableMap(true);
+        assertThrows(IllegalArgumentException.class, () -> Bitmap64IndexQuery.checkArgs(null, bitmap));
+    }

99-114: getMatchingValues hardcodes field name and doesn't guard against null scorer.

This public static helper is reused by Bitmap64DocValuesQueryTests. Two minor concerns:

  1. The field name "product_id" is hardcoded (Line 102). If any test uses a different field name, the helper will silently return wrong values.
  2. weight.scorer(leaf) could return null for segments without matching points, which would NPE on Line 104. Current tests don't hit this, but it's fragile for reuse.
Suggested improvement
-    static List<Long> getMatchingValues(Weight weight, IndexReader reader) throws IOException {
+    static List<Long> getMatchingValues(Weight weight, IndexReader reader, String field) throws IOException {
         List<Long> actual = new LinkedList<>();
         for (LeafReaderContext leaf : reader.leaves()) {
-            SortedNumericDocValues dv = DocValues.getSortedNumeric(leaf.reader(), "product_id");
+            SortedNumericDocValues dv = DocValues.getSortedNumeric(leaf.reader(), field);
             Scorer scorer = weight.scorer(leaf);
+            if (scorer == null) {
+                continue;
+            }
             DocIdSetIterator disi = scorer.iterator();

149-184: Random test could produce fewer than 5 query values due to HashSet dedup.

queryValues is a HashSet with up to 5 random values, but duplicates are silently absorbed. The test still works (it's just less random than it could be), but if the intent is "5 distinct values," the loop should repeat until the set has 5 elements. This is a nitpick — the test is still valid.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 309a9c6 and 6a4e638.

📒 Files selected for processing (9)
  • CHANGELOG.md
  • rest-api-spec/src/main/resources/rest-api-spec/test/search/380_bitmap_filtering.yml
  • server/src/internalClusterTest/java/org/opensearch/search/query/SearchQueryIT.java
  • server/src/main/java/org/opensearch/index/mapper/NumberFieldMapper.java
  • server/src/main/java/org/opensearch/search/query/Bitmap64DocValuesQuery.java
  • server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java
  • server/src/test/java/org/opensearch/index/mapper/NumberFieldTypeTests.java
  • server/src/test/java/org/opensearch/search/query/Bitmap64DocValuesQueryTests.java
  • server/src/test/java/org/opensearch/search/query/Bitmap64IndexQueryTests.java
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-01-13T17:40:27.167Z
Learnt from: reta
Repo: opensearch-project/OpenSearch PR: 20411
File: server/src/main/java/org/opensearch/index/codec/CodecService.java:112-133
Timestamp: 2026-01-13T17:40:27.167Z
Learning: Avoid capturing or evaluating a supplier (e.g., this::defaultCodec) upfront when passing it to a registry during object construction. If registries may replace defaults during iteration (as in EnginePlugin.getAdditionalCodecs), pass the supplier itself and only resolve it at use time. This ensures dynamic behavior is preserved during initialization and prevents premature binding of defaults in codecs/registry setup. This pattern should apply to similar initialization paths in Java server code where registries may mutate defaults during construction.

Applied to files:

  • server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java
  • server/src/main/java/org/opensearch/search/query/Bitmap64DocValuesQuery.java
  • server/src/main/java/org/opensearch/index/mapper/NumberFieldMapper.java
📚 Learning: 2025-12-02T22:44:14.799Z
Learnt from: prudhvigodithi
Repo: opensearch-project/OpenSearch PR: 20112
File: server/src/internalClusterTest/java/org/opensearch/search/slice/SearchSliceIT.java:73-81
Timestamp: 2025-12-02T22:44:14.799Z
Learning: In OpenSearch integration tests extending OpenSearchIntegTestCase, using `LuceneTestCase.SuppressCodecs("*")` triggers special handling that selects a random production codec from the CODECS array, while `SuppressCodecs("Asserting")` or other specific codec suppressions still allow Lucene's default codec randomization which may include the asserting codec. Use `SuppressCodecs("*")` when you need to completely avoid asserting codecs (e.g., for cross-thread StoredFieldsReader usage) while maintaining production codec test coverage.

Applied to files:

  • server/src/internalClusterTest/java/org/opensearch/search/query/SearchQueryIT.java
🧬 Code graph analysis (5)
server/src/test/java/org/opensearch/search/query/Bitmap64IndexQueryTests.java (1)
server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java (1)
  • Bitmap64IndexQuery (63-298)
server/src/test/java/org/opensearch/index/mapper/NumberFieldTypeTests.java (2)
server/src/main/java/org/opensearch/search/query/Bitmap64DocValuesQuery.java (1)
  • Bitmap64DocValuesQuery (46-161)
server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java (1)
  • Bitmap64IndexQuery (63-298)
server/src/internalClusterTest/java/org/opensearch/search/query/SearchQueryIT.java (1)
server/src/main/java/org/opensearch/index/query/TermsQueryBuilder.java (1)
  • TermsQueryBuilder (89-771)
server/src/main/java/org/opensearch/search/query/Bitmap64DocValuesQuery.java (2)
server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java (1)
  • Bitmap64IndexQuery (63-298)
server/src/main/java/org/opensearch/search/query/BitmapDocValuesQuery.java (1)
  • BitmapDocValuesQuery (41-156)
server/src/main/java/org/opensearch/index/mapper/NumberFieldMapper.java (2)
server/src/main/java/org/opensearch/search/query/Bitmap64DocValuesQuery.java (1)
  • Bitmap64DocValuesQuery (46-161)
server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java (1)
  • Bitmap64IndexQuery (63-298)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: check-files
  • GitHub Check: Mend Security Check
🔇 Additional comments (19)
server/src/internalClusterTest/java/org/opensearch/search/query/SearchQueryIT.java (3)

87-108: LGTM.


1203-1234: Good coverage for long bitmap terms query.


1236-1269: Large-value bitmap scenario covered.

rest-api-spec/src/main/resources/rest-api-spec/test/search/380_bitmap_filtering.yml (1)

185-295: Nice addition to long-field bitmap coverage.

server/src/test/java/org/opensearch/index/mapper/NumberFieldTypeTests.java (3)

80-101: LGTM.


1026-1033: Good adjustment to the randomized type selection.


1035-1060: Long bitmap query coverage looks solid.

server/src/main/java/org/opensearch/index/mapper/NumberFieldMapper.java (4)

77-101: LGTM.


1166-1201: Good cross-format bitmap handling for integer fields.


1466-1510: Long bitmap query override looks good.


1517-1519: Clear UNSIGNED_LONG guidance.

server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java (4)

39-62: Thorough Javadoc — well documented.

The documentation clearly explains the signed-mode requirement, cross-language interoperability considerations, and serialization format. This is valuable context for maintainers.


118-127: advance does not skip to or past targetVal — only past values strictly less than it.

The contract of advance is documented as "advance as long as the next value is smaller than target." After the loop, buffered >= targetVal (or exhausted). This is correct for the merge-join: the subsequent next() in matches() will return the first value ≥ target. Just confirming the signed comparison buffered < targetVal is sound here because both sides are decoded signed longs and the bitmap iterates in signed order (signedLongs=true).


219-232: matches() does not advance past a matched value — intentional for duplicate doc values.

After returning true on an exact match (cmp == 0), nextQueryPoint still points to the matched encoded value. This is correct: multiple documents can share the same BKD point value, and subsequent calls to matches() with that same value will match again. The iterator only advances when the current query point falls behind (cmp < 0).


234-256: compare() may return CELL_INSIDE_QUERY only when the cell is a single point — verify BKD invariant.

CELL_INSIDE_QUERY (Line 249) is returned only when cmpMin == 0 && cmpMax == 0, meaning min == max == queryPoint. This is correct: if the cell's min and max are identical, all documents in that cell hold that single value, and it matches the bitmap. For cells with a range of values, CELL_CROSSES_QUERY is returned, which is conservative but correct.

One thing to be aware of: this doesn't attempt to detect the case where min != max but every value in the cell's range is present in the bitmap — that's a possible optimization but not a correctness issue.

server/src/main/java/org/opensearch/search/query/Bitmap64DocValuesQuery.java (3)

46-64: Clean constructor with min/max optimization — consistent with 32-bit BitmapDocValuesQuery.

The min/max bounds computed from bitmap.first()/bitmap.last() provide an efficient range pre-check before the heavier bitmap.contains() call. This requires signed mode (signedLongs=true) so that first()/last() return correct signed bounds — which is documented in the class Javadoc.


87-110: Multi-value path relies on ascending doc-value order for early exit — correct.

SortedNumericDocValues guarantees values are returned in ascending order within a document, so the value > max → return false short-circuit on Line 96-97 is valid: once a value exceeds the bitmap's maximum, no subsequent value can match.


112-112: DefaultScorerSupplier is a legitimate Apache Lucene utility class.

The class org.apache.lucene.search.Weight.DefaultScorerSupplier is an internal Lucene wrapper that implements ScorerSupplier and provides the expected semantics: get() returns the wrapped scorer, and cost() is delegated appropriately. The usage in this code is correct and follows standard Lucene patterns.

server/src/test/java/org/opensearch/search/query/Bitmap64IndexQueryTests.java (1)

66-97: Solid test coverage across value ranges and edge cases.

The test suite covers basic matching, multi-value documents, large values beyond 32-bit range, negative longs, Long.MIN_VALUE/MAX_VALUE boundaries, random fuzz, argument validation, rewrite semantics, and BKD point-visitor integration. This is thorough for a new query type.

Also applies to: 116-147, 186-217, 219-246, 248-275, 298-317

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

CHANGELOG.md Outdated

## [Unreleased 3.x]
### Added
- Add Roaring64NavigableMap support for bitmap filtering on long fields ([#XXXXX](https://github.com/opensearch-project/OpenSearch/pull/XXXXX))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace placeholder PR number.

The changelog entry should link to the actual PR.

📝 Suggested update
-- Add Roaring64NavigableMap support for bitmap filtering on long fields ([`#XXXXX`](https://github.com/opensearch-project/OpenSearch/pull/XXXXX))
+- Add Roaring64NavigableMap support for bitmap filtering on long fields ([`#20598`](https://github.com/opensearch-project/OpenSearch/pull/20598))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- Add Roaring64NavigableMap support for bitmap filtering on long fields ([#XXXXX](https://github.com/opensearch-project/OpenSearch/pull/XXXXX))
- Add Roaring64NavigableMap support for bitmap filtering on long fields ([`#20598`](https://github.com/opensearch-project/OpenSearch/pull/20598))
🤖 Prompt for AI Agents
In `@CHANGELOG.md` at line 8, Update the changelog entry that currently contains
the placeholder PR number by replacing
([`#XXXXX`](https://github.com/opensearch-project/OpenSearch/pull/XXXXX)) with the
actual pull request number and URL for the change described in the line "Add
Roaring64NavigableMap support for bitmap filtering on long fields"; ensure the
link text and href both use the real PR number (e.g.,
[`#12345`](https://github.com/opensearch-project/OpenSearch/pull/12345)).

@github-actions
Copy link
Contributor

❌ Gradle check result for 7acd66b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant