[Indexing] Add support of "extra fields" outside _source indexing#20635
[Indexing] Add support of "extra fields" outside _source indexing#20635laminelam wants to merge 3 commits intoopensearch-project:mainfrom
Conversation
Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR introduces comprehensive support for storing extra field values (fields outside Changes
Sequence DiagramsequenceDiagram
participant Client
participant Request as Index/Update/Bulk<br/>Request
participant Transport as Transport<br/>ShardBulkAction
participant Parser as Document<br/>Parser
participant FieldMapper as Field<br/>Mapper
participant Storage as Lucene<br/>Document
Client->>Request: create with<br/>extraFieldValues()
Request->>Transport: process (primary)
Transport->>Parser: call with<br/>SourceToParse
Parser->>Parser: applyExtraFieldValues()
loop for each extra field
Parser->>Parser: validate path &<br/>nested parents
Parser->>FieldMapper: resolve mapper
FieldMapper->>FieldMapper: supportsExtraFieldValues()?
alt supported
FieldMapper->>FieldMapper: parseCreateField()
FieldMapper->>Storage: store field +<br/>metadata (_type, _dim)
else not supported
FieldMapper->>Parser: throw validation error
end
end
Parser->>Storage: finalize document
Transport->>Transport: replicate (replica)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@server/src/main/java/org/opensearch/action/update/UpdateRequest.java`:
- Around line 724-733: The Javadoc for UpdateRequest.docExtraFieldValues has a
typo ("extral") and should be corrected to "extra"; update the method comment
block above public UpdateRequest docExtraFieldValues(ExtraFieldValues values) to
read "Sets extra field values for the partial document update" and adjust any
related wording (e.g., "These values...") while keeping the rest of the
documentation and references to ExtraFieldValues,
safeDoc().extraFieldValues(values), and ExtraFieldValues#EMPTY unchanged.
🧹 Nitpick comments (5)
server/src/test/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArrayTests.java (1)
54-57: Minor: Duplicate assertion.Line 56 duplicates the assertion at line 54 (
read.dimension()is checked twice forvals.length).🔧 Remove duplicate assertion
assertThat(read.dimension(), is(vals.length)); assertThat(read.isPackedLE(), is(false)); - assertThat(read.dimension(), is(vals.length)); assertThat(read.asFloatArray(), equalTo(vals));server/src/main/java/org/opensearch/index/mapper/DocumentParser.java (1)
153-189: Resolve or track the EFV/JSON exclusivity TODO. The TODO hints at a missing validation rule; leaving both JSON and EFV populated could create ambiguous precedence. Consider enforcing it here or opening a follow‑up.If you want, I can draft the validation or open a tracking issue.
server/src/main/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArray.java (2)
60-65: Add braces around the for-loop body for consistency.The loop body at lines 62-63 lacks braces, which is inconsistent with the style in
writePayloadTo(lines 55-57).🔧 Suggested fix
static PrimitiveFloatArray readBodyFrom(StreamInput in, int dim) throws IOException { final float[] v = new float[dim]; - for (int j = 0; j < dim; j++) - v[j] = in.readFloat(); + for (int j = 0; j < dim; j++) { + v[j] = in.readFloat(); + } return new PrimitiveFloatArray(v); }
24-26: Consider null validation in constructor.The constructor accepts the array without null-checking. If a null array is passed, subsequent calls to
dimension()orget()will throw NPE. Since this is a public class, defensive validation may be warranted.🛡️ Suggested fix
public PrimitiveFloatArray(float[] v) { + java.util.Objects.requireNonNull(v, "v must not be null"); this.v = v; }server/src/test/java/org/opensearch/index/mapper/extrasource/PackedFloatArrayTests.java (1)
159-163: Remove unused helper methodsetPrivateField.This helper method is defined but never used in any test. Consider removing it to avoid dead code, or add a test that requires it.
🧹 Suggested fix
- private static void setPrivateField(Object target, String fieldName, Object value) throws Exception { - Field f = target.getClass().getDeclaredField(fieldName); - f.setAccessible(true); - f.set(target, value); - }
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (24)
server/src/internalClusterTest/java/org/opensearch/index/mapper/ExtraFieldValuesRequestIT.javaserver/src/internalClusterTest/java/org/opensearch/update/UpdateIT.javaserver/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.javaserver/src/main/java/org/opensearch/action/index/IndexRequest.javaserver/src/main/java/org/opensearch/action/update/UpdateHelper.javaserver/src/main/java/org/opensearch/action/update/UpdateRequest.javaserver/src/main/java/org/opensearch/index/mapper/DocumentParser.javaserver/src/main/java/org/opensearch/index/mapper/FieldMapper.javaserver/src/main/java/org/opensearch/index/mapper/SourceToParse.javaserver/src/main/java/org/opensearch/index/mapper/extrasource/BytesValue.javaserver/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValue.javaserver/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValues.javaserver/src/main/java/org/opensearch/index/mapper/extrasource/FloatArrayValue.javaserver/src/main/java/org/opensearch/index/mapper/extrasource/PackedFloatArray.javaserver/src/main/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArray.javaserver/src/test/java/org/opensearch/action/update/UpdateRequestTests.javaserver/src/test/java/org/opensearch/index/mapper/extrasource/BytesValueTests.javaserver/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValueTests.javaserver/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesIngestTests.javaserver/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesMapperPlugin.javaserver/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesTests.javaserver/src/test/java/org/opensearch/index/mapper/extrasource/ExtraTestFieldMapper.javaserver/src/test/java/org/opensearch/index/mapper/extrasource/PackedFloatArrayTests.javaserver/src/test/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArrayTests.java
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-13T17:40:27.167Z
Learnt from: reta
Repo: opensearch-project/OpenSearch PR: 20411
File: server/src/main/java/org/opensearch/index/codec/CodecService.java:112-133
Timestamp: 2026-01-13T17:40:27.167Z
Learning: Avoid capturing or evaluating a supplier (e.g., this::defaultCodec) upfront when passing it to a registry during object construction. If registries may replace defaults during iteration (as in EnginePlugin.getAdditionalCodecs), pass the supplier itself and only resolve it at use time. This ensures dynamic behavior is preserved during initialization and prevents premature binding of defaults in codecs/registry setup. This pattern should apply to similar initialization paths in Java server code where registries may mutate defaults during construction.
Applied to files:
server/src/main/java/org/opensearch/action/index/IndexRequest.javaserver/src/main/java/org/opensearch/index/mapper/DocumentParser.javaserver/src/main/java/org/opensearch/action/update/UpdateRequest.javaserver/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValues.javaserver/src/main/java/org/opensearch/index/mapper/extrasource/FloatArrayValue.javaserver/src/main/java/org/opensearch/action/update/UpdateHelper.javaserver/src/main/java/org/opensearch/index/mapper/SourceToParse.javaserver/src/main/java/org/opensearch/index/mapper/extrasource/BytesValue.javaserver/src/main/java/org/opensearch/index/mapper/FieldMapper.javaserver/src/main/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArray.javaserver/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.javaserver/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValue.javaserver/src/main/java/org/opensearch/index/mapper/extrasource/PackedFloatArray.java
🧬 Code graph analysis (8)
server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValueTests.java (1)
libs/core/src/main/java/org/opensearch/core/common/bytes/BytesArray.java (1)
BytesArray(48-151)
server/src/test/java/org/opensearch/index/mapper/extrasource/BytesValueTests.java (1)
libs/core/src/main/java/org/opensearch/core/common/bytes/BytesArray.java (1)
BytesArray(48-151)
server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesTests.java (1)
libs/core/src/main/java/org/opensearch/core/common/bytes/BytesArray.java (1)
BytesArray(48-151)
server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraTestFieldMapper.java (1)
server/src/main/java/org/opensearch/index/mapper/FieldMapper.java (3)
Builder(85-214)Builder(740-793)Builder(894-913)
server/src/internalClusterTest/java/org/opensearch/index/mapper/ExtraFieldValuesRequestIT.java (3)
libs/core/src/main/java/org/opensearch/core/common/bytes/BytesArray.java (1)
BytesArray(48-151)server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesMapperPlugin.java (1)
ExtraFieldValuesMapperPlugin(19-28)server/src/main/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArray.java (1)
PrimitiveFloatArray(21-66)
server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValue.java (1)
server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValues.java (1)
ExperimentalApi(20-53)
server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesMapperPlugin.java (1)
server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraTestFieldMapper.java (1)
ExtraTestFieldMapper(26-106)
server/src/main/java/org/opensearch/index/mapper/extrasource/PackedFloatArray.java (1)
libs/core/src/main/java/org/opensearch/core/common/bytes/BytesArray.java (1)
BytesArray(48-151)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
- GitHub Check: gradle-check
- GitHub Check: detect-breaking-change
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: Analyze (java)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: Mend Security Check
🔇 Additional comments (35)
server/src/main/java/org/opensearch/action/update/UpdateHelper.java (1)
254-254: LGTM - Correct propagation of ExtraFieldValues in doc-based update path.This correctly forwards
extraFieldValuesfrom the incoming doc request to the finalIndexRequest, ensuring EFV data flows through the update pipeline for non-script updates.server/src/main/java/org/opensearch/index/mapper/FieldMapper.java (2)
336-341: LGTM - Clean opt-in hook for ExtraFieldValues support.The method follows the established pattern in
FieldMapper(similar toparsesArrayValue()), providing a sensible default offalsethat requires mappers to explicitly opt-in for EFV support.
50-50: TheExtraFieldValuesimport is used in the Javadoc comment at line 337 ({@linkExtraFieldValues}). The import is necessary for proper Javadoc link resolution and should be retained.Likely an incorrect or invalid review comment.
server/src/test/java/org/opensearch/action/update/UpdateRequestTests.java (1)
653-666: LGTM - Good test coverage for script + ExtraFieldValues validation.The test correctly validates that combining a script with ExtraFieldValues on the upsert path triggers the expected validation error. The test structure is consistent with other validation tests in this file.
server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java (2)
665-672: LGTM - Correct EFV propagation on primary path.The
SourceToParseconstruction now includesextraFieldValues()from the index request, enabling the parsing layer to consume extra field values during primary shard indexing.
918-919: LGTM - Consistent EFV propagation on replica path.The replica path mirrors the primary path change, ensuring
extraFieldValuesare consistently available during replica shard indexing. This symmetry is important for correct replication behavior.server/src/internalClusterTest/java/org/opensearch/update/UpdateIT.java (1)
550-564: LGTM - Good integration test for script + ExtraFieldValues rejection.This integration test complements the unit test in
UpdateRequestTestsby validating the same behavior through the full client API stack. The test follows the established patterns in this file (usingindexOrAlias(),expectThrows, etc.).server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValueTests.java (1)
21-69: LGTM - Comprehensive serialization tests for ExtraFieldValue types.The tests cover essential scenarios:
BytesValueround-trip with content verificationFloatArrayValueround-trip with dimension and value verification- Error handling for unknown type IDs
Good coverage of the serialization contract.
server/src/main/java/org/opensearch/index/mapper/extrasource/BytesValue.java (1)
17-35: LGTM - Clean record implementation for BytesValue.The record pattern is appropriate for this immutable value type. The implementation is minimal and focused with proper visibility for
readBodyFrom.One consideration: the record accepts a potentially null
BytesReferencewithout validation. If null safety is a concern, consider adding a compact constructor with a null check. However, this may not be necessary if callers are expected to validate inputs.server/src/test/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArrayTests.java (1)
20-75: LGTM - Thorough test coverage for PrimitiveFloatArray.The tests comprehensively cover:
- Basic accessors (
dimension,get,asFloatArray,isPackedLE)- Error handling (
packedBytes()throws on unpacked instance)- Serialization round-trip
- Empty array edge case
Good alignment with the
PrimitiveFloatArrayimplementation.server/src/main/java/org/opensearch/action/update/UpdateRequest.java (3)
63-63: LGTM. ExtraFieldValues import aligns with the new API usage.
239-245: Good guard against scripted EFV usage. Prevents unsupported script paths from accepting extra field values.
821-829: LGTM. Upsert EFV setter mirrors the doc path cleanly.server/src/test/java/org/opensearch/index/mapper/extrasource/BytesValueTests.java (1)
1-82: Solid round‑trip coverage for BytesValue. Exercises BytesArray, composite, and dispatch paths.server/src/main/java/org/opensearch/index/mapper/DocumentParser.java (3)
50-51: LGTM. EFV imports are consistent with the new parsing flow.
146-146: LGTM. Applying EFV before metadata post-parse keeps the pipeline ordering consistent.
191-205: LGTM. Nested‑parent guard is clear and appropriately scoped.server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesTests.java (1)
1-89: Nice coverage of immutability and stream round‑trip.server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesMapperPlugin.java (1)
1-27: LGTM. Clear and minimal mapper plugin registration for tests.server/src/main/java/org/opensearch/action/index/IndexRequest.java (5)
65-65: LGTM. EFV import matches the new request field.
116-116: LGTM. Defaulting toExtraFieldValues.EMPTYavoids null checks.
158-160: LGTM. Version‑gated EFV deserialization looks correct.
504-519: LGTM. Setter null‑resets to EMPTY and the getter keeps the API clean.
702-704: LGTM. Version‑gated EFV serialization is consistent with the read path.server/src/main/java/org/opensearch/index/mapper/SourceToParse.java (3)
40-40: LGTM. EFV import is consistent with new constructor overloads.
62-84: LGTM. Constructor chaining keeps backward compatibility and defaults EFV to EMPTY.
110-112: LGTM. Simple, clear EFV accessor.server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValues.java (1)
1-51: LGTM. Clean immutable container with streamable support.server/src/main/java/org/opensearch/index/mapper/extrasource/FloatArrayValue.java (1)
17-68: LGTM!The interface design is clean with a well-defined contract:
- Proper sealed interface extending
ExtraFieldValue- Symmetric read/write protocol with
writeBodyTo/readBodyFrom- Clear separation between header metadata (packedLE flag, dimension) and payload serialization via
writePayloadToserver/src/internalClusterTest/java/org/opensearch/index/mapper/ExtraFieldValuesRequestIT.java (1)
38-202: LGTM!Comprehensive integration tests covering:
- Basic index operations with
BytesValue- Update operations with
PrimitiveFloatArray- Upsert operations with
BytesValue- Bulk operations with mixed EFV types
The test structure is clean with appropriate assertions on stored field values.
server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesIngestTests.java (1)
29-135: LGTM!Well-structured unit tests for EFV ingestion covering:
- Root field with
BytesValue- Nested object path with
PrimitiveFloatArray- Error handling when no mapper exists for the extra field
The tests properly exercise the
DocumentMapper.parse()path withSourceToParsecontainingExtraFieldValues.server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValue.java (1)
17-90: LGTM!Clean sealed interface design with:
- Appropriate
@ExperimentalApiannotation for the new feature- Efficient type lookup via byte id with O(1) array access
- Well-defined wire encoding protocol:
[typeId][body]- Good error handling for unknown type ids
The TODO for future types (double, int, long) is noted and the design accommodates extensibility.
server/src/main/java/org/opensearch/index/mapper/extrasource/PackedFloatArray.java (1)
35-165: LGTM!Well-implemented packed float array with:
- Efficient lazy decoding with proper volatile fields for thread-safety
- Zero-copy optimization for
BytesArraybacked references- Correct little-endian decoding in
decodeAt()- Robust validation with overflow protection using
Math.multiplyExact- Proper bounds checking in
get()The documented concurrency behavior (multiple threads may transiently materialize but results are equivalent) is correctly implemented with Java's volatile memory model guarantees.
server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraTestFieldMapper.java (1)
26-106: LGTM!Appropriate test-only mapper implementation that:
- Properly implements
supportsExtraFieldValues()returningtrue- Consumes
ExtraFieldValueviacontext.parseExternalValue()- Handles both
BytesValueandFloatArrayValuetypes- Stores metadata fields for test verification (type, length/dimension, first float)
This provides the necessary test infrastructure for validating the EFV feature end-to-end.
server/src/test/java/org/opensearch/index/mapper/extrasource/PackedFloatArrayTests.java (1)
28-157: LGTM!Comprehensive test coverage for
PackedFloatArrayincluding:
- Basic construction and value retrieval
- Zero-copy optimization verification for
BytesArrayslices- Composite
BytesReferencematerialization behavior- Cache identity verification for
asFloatArray()- Serialization round-trip
- Validation and bounds checking error cases
The use of reflection to verify internal state is appropriate for testing optimization guarantees.
| /** | ||
| * Sets extral field values for the partial document update ({@code doc}). | ||
| * <p> | ||
| * These values are applied only when the update is executed using {@code doc} (i.e., no script). | ||
| * {@code null} clears the values and resets to {@link ExtraFieldValues#EMPTY}. | ||
| */ | ||
| public UpdateRequest docExtraFieldValues(ExtraFieldValues values) { | ||
| safeDoc().extraFieldValues(values); | ||
| return this; | ||
| } |
There was a problem hiding this comment.
Fix Javadoc typo ("extral").
📝 Proposed fix
- * Sets extral field values for the partial document update ({`@code` doc}).
+ * Sets extra field values for the partial document update ({`@code` doc}).🤖 Prompt for AI Agents
In `@server/src/main/java/org/opensearch/action/update/UpdateRequest.java` around
lines 724 - 733, The Javadoc for UpdateRequest.docExtraFieldValues has a typo
("extral") and should be corrected to "extra"; update the method comment block
above public UpdateRequest docExtraFieldValues(ExtraFieldValues values) to read
"Sets extra field values for the partial document update" and adjust any related
wording (e.g., "These values...") while keeping the rest of the documentation
and references to ExtraFieldValues, safeDoc().extraFieldValues(values), and
ExtraFieldValues#EMPTY unchanged.
|
❌ Gradle check result for 5d4649e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 573da92: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
…rialization (version mismatch) Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>
|
❌ Gradle check result for 0f11c94: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
Hi @msfroh As per our conversation, this is the non grpc part of the contribution. The contribution is split across multiple PRs to simplify review and maintain separation of concerns. |
This PR introduces ExtraFieldValues, a side-channel mechanism for supplying field values outside of _source during indexing.
This change is part of a broader effort to enhance gRPC indexing, enabling primitive vector values to be indexed as non-_source data for improved ingestion performance.
Key points:
Adds ExtraFieldValues support to IndexRequest and UpdateRequest (doc and upsert paths).
Enables mappers to consume values directly, bypassing JSON parsing.
Currently supports float arrays only (primitive and packed little-endian representations).
Transport serialization is version-gated for backward compatibility.
For now, only doc-based updates and upserts are supported (script not supported).
Benchmark
Benchmarked ingest throughput across increasing vector dimensions using four representations:
Across all tested dimensions:
JSON float arrays show the lowest throughput and degrade rapidly as vector size increases.
JSON (binary/base64) performs significantly better than float arrays.
SMILE and JSON(base64) are generally comparable, with SMILE often providing significant improvements at higher dimensions.
ExtraFieldValues consistently provides the highest throughput and scales best with increasing vector size.
At higher dimensions (e.g., 1024+), ExtraFieldValues shows a clear and consistent advantage over the _source-based approaches, achieving up to 60× higher throughput compared to JSON arrays.
These results indicate that bypassing _source parsing for large vector payloads can substantially improve ingest throughput.
Note: ExtraFieldValues was tested with the PackedFloatArray option (packed little-endian floats)
Ingest Throughput (baseline = JSON float array)
docs/sec (X× faster with json array as base)
Related Issues
#19638
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.