Skip to content

Comments

Fix annotated highlighter wrapping issue#20682

Draft
sjs004 wants to merge 2 commits intoopensearch-project:mainfrom
sjs004:fix-annotated-highlighter-wrapping
Draft

Fix annotated highlighter wrapping issue#20682
sjs004 wants to merge 2 commits intoopensearch-project:mainfrom
sjs004:fix-annotated-highlighter-wrapping

Conversation

@sjs004
Copy link
Contributor

@sjs004 sjs004 commented Feb 19, 2026

Description

This PR fixes a ClassCastException occurring in the mapper-annotated-text plugin when highlighting fields with max_analyzer_offset configured. The fix involves refactoring how UnifiedHighlighter composes analyzers, moving from a hardcoded wrapping logic to an extensible hook-based approach (wrapAnalyzer)

The Problem (Old Flow)
Previously, the UnifiedHighlighter core would fetch the analyzer from the plugin and then forcefully wrap it in a LimitTokenOffsetAnalyzer (an anonymous class UnifiedHighlighter$1) if a maximum offset was set

Old Object Hierarchy:

UnifiedHighlighter$1 (Outer Shell)

└── AnnotatedHighlighterAnalyzer (Inner Layer)

└── BaseAnalyzer (Core Layer)

Because the Outer Shell was UnifiedHighlighter$1, the plugin’s attempt to cast the analyzer back to its own type in loadFieldValues failed

The Solution (New Flow)
Introduced a protected wrapAnalyzer(Analyzer, Integer) hook in the UnifiedHighlighter core. This allows the plugin to control the order of decoration.

By overriding this hook in the plugin and calling super.wrapAnalyzer() first, we ensure that the Core applies the limit wrapper before the plugin applies its own specialized wrapper

New Object Hierarchy:

AnnotatedHighlighterAnalyzer (Outer Shell)

└── UnifiedHighlighter$1 (Inner Layer - Limit Wrapper)

└── BaseAnalyzer (Core Layer)

Key Changes

  1. OpenSearch Core (UnifiedHighlighter.java):
  • Removed hardcoded wrapping in buildHighlighter
  • Added protected Analyzer wrapAnalyzer(Analyzer analyzer, Integer fieldMaxAnalyzedOffset) to provide a standard extension point.
  1. Plugin (AnnotatedTextHighlighter.java):
  • Overrode wrapAnalyzer to ensure AnnotatedHighlighterAnalyzer remains the outermost object
  • Removed the obsolete getAnalyzer override

Related Issues

Resolves #20669

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: sjs004 <simarjeet.singh004@gmail.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 19, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The changes refactor analyzer construction in two highlighter classes by introducing a new wrapAnalyzer() helper method. The UnifiedHighlighter now uses this method to conditionally wrap analyzers with LimitedOffsetAnalyzer, while AnnotatedTextHighlighter delegates through it to wrap with AnnotatedHighlighterAnalyzer. This centralizes offset-limiting logic.

Changes

Cohort / File(s) Summary
Analyzer Wrapper Delegation
plugins/mapper-annotated-text/src/main/java/org/opensearch/search/fetch/subphase/highlight/AnnotatedTextHighlighter.java, server/src/main/java/org/opensearch/search/fetch/subphase/highlight/UnifiedHighlighter.java
Both classes now use a wrapAnalyzer() method pattern. UnifiedHighlighter introduces the new protected helper method to conditionally apply LimitedOffsetAnalyzer based on fieldMaxAnalyzedOffset. AnnotatedTextHighlighter overrides wrapAnalyzer() instead of getAnalyzer(), wrapping the core analyzer result with AnnotatedHighlighterAnalyzer. Analyzer initialization changed from direct DocumentMapper access to through mappers().indexAnalyzer().

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description contains only the template structure with a placeholder '[Describe what this change achieves]' and does not include an actual description of the changes, objectives, or reasoning. Replace the placeholder with a concrete description of what the change achieves, explain why the wrapping approach was refactored, and confirm test coverage.
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fix annotated highlighter wrapping issue' is directly related to the main change, which refactors the analyzer wrapping logic in the AnnotatedTextHighlighter class.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sjs004 sjs004 marked this pull request as draft February 19, 2026 18:48
@github-actions github-actions bot added bug Something isn't working Search:Query Capabilities labels Feb 19, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
plugins/mapper-annotated-text/src/main/java/org/opensearch/search/fetch/subphase/highlight/AnnotatedTextHighlighter.java (1)

38-38: ⚠️ Potential issue | 🟡 Minor

Unused DocumentMapper import — remove it.

DocumentMapper was referenced by the now-removed getAnalyzer(DocumentMapper) override. No method in the current class uses it.

🧹 Proposed fix
-import org.opensearch.index.mapper.DocumentMapper;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugins/mapper-annotated-text/src/main/java/org/opensearch/search/fetch/subphase/highlight/AnnotatedTextHighlighter.java`
at line 38, The import org.opensearch.index.mapper.DocumentMapper in
AnnotatedTextHighlighter is unused (the getAnalyzer(DocumentMapper) override was
removed); delete that import line and run organize-imports / recompile to ensure
no other references remain so the class compiles cleanly.
server/src/main/java/org/opensearch/search/fetch/subphase/highlight/UnifiedHighlighter.java (1)

211-213: ⚠️ Potential issue | 🟠 Major

getAnalyzer(DocumentMapper) is dead code and should be removed or deprecated.

buildHighlighter directly accesses the index analyzer at line 164 without calling getAnalyzer. The method is unused and is not overridden by any current subclasses (CustomUnifiedHighlighter or AnnotatedTextHighlighter), but its protected visibility creates a hazard: future subclasses might override it expecting it to be called during highlighting, only to have their override silently ignored. Remove the method or deprecate it clearly to guide future maintainers toward wrapAnalyzer instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@server/src/main/java/org/opensearch/search/fetch/subphase/highlight/UnifiedHighlighter.java`
around lines 211 - 213, The getAnalyzer(DocumentMapper) method is dead code and
should be removed or clearly deprecated to avoid future subclass confusion:
either delete the protected getAnalyzer(DocumentMapper) method from
UnifiedHighlighter (and remove any unused imports/tests) or mark it `@Deprecated`,
change visibility to private or final if keeping, and update its Javadoc to
point callers to wrapAnalyzer; ensure you reference getAnalyzer,
buildHighlighter (which currently reads the index analyzer directly), and the
subclasses CustomUnifiedHighlighter and AnnotatedTextHighlighter when making the
change so you don’t break expected behavior.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9c62834 and 9b469e7.

📒 Files selected for processing (2)
  • plugins/mapper-annotated-text/src/main/java/org/opensearch/search/fetch/subphase/highlight/AnnotatedTextHighlighter.java
  • server/src/main/java/org/opensearch/search/fetch/subphase/highlight/UnifiedHighlighter.java
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-13T17:40:27.167Z
Learnt from: reta
Repo: opensearch-project/OpenSearch PR: 20411
File: server/src/main/java/org/opensearch/index/codec/CodecService.java:112-133
Timestamp: 2026-01-13T17:40:27.167Z
Learning: Avoid capturing or evaluating a supplier (e.g., this::defaultCodec) upfront when passing it to a registry during object construction. If registries may replace defaults during iteration (as in EnginePlugin.getAdditionalCodecs), pass the supplier itself and only resolve it at use time. This ensures dynamic behavior is preserved during initialization and prevents premature binding of defaults in codecs/registry setup. This pattern should apply to similar initialization paths in Java server code where registries may mutate defaults during construction.

Applied to files:

  • server/src/main/java/org/opensearch/search/fetch/subphase/highlight/UnifiedHighlighter.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: detect-breaking-change
  • GitHub Check: Analyze (java)
🔇 Additional comments (2)
server/src/main/java/org/opensearch/search/fetch/subphase/highlight/UnifiedHighlighter.java (1)

215-220: wrapAnalyzer helper is well-structured.

The null check on fieldMaxAnalyzedOffset correctly handles the non-offset-limiting path, and delegation to getLimitedOffsetAnalyzer is clean.

plugins/mapper-annotated-text/src/main/java/org/opensearch/search/fetch/subphase/highlight/AnnotatedTextHighlighter.java (1)

78-81: LGTM — correct fix for the annotated highlighter wrapping.

super.wrapAnalyzer is applied first to optionally inject LimitedOffsetAnalyzer, then AnnotatedHighlighterAnalyzer wraps the result. When fieldMaxAnalyzedOffset is null the chain is AnnotatedHighlighterAnalyzer(baseAnalyzer) — identical to the old behaviour — and when it is set the chain becomes AnnotatedHighlighterAnalyzer(LimitedOffsetAnalyzer(baseAnalyzer)), which is the intended fix. The cast in loadFieldValues (line 73) remains safe because this override always produces an AnnotatedHighlighterAnalyzer.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@server/src/main/java/org/opensearch/search/fetch/subphase/highlight/UnifiedHighlighter.java`:
- Around line 164-169: In UnifiedHighlighter, the analyzer returned by
wrapAnalyzer is being overwritten when a derived field type exists; move the
call to wrapAnalyzer so it is applied after resolving derivedFieldType (i.e.,
resolve derivedFieldType via
context.getQueryShardContext().resolveDerivedFieldType(fieldContext.fieldName),
select the analyzer from either the derived field
((DerivedFieldType).getIndexAnalyzer()) or the original indexAnalyzer, and then
call wrapAnalyzer on that chosen analyzer using fieldMaxAnalyzedOffset) so the
offset-limiting wrapper is never discarded; also ensure that when
AnnotatedTextHighlighter will later cast the analyzer (see
AnnotatedTextHighlighter.loadFieldValues) the wrapped analyzer preserves/returns
an AnnotatedHighlighterAnalyzer-compatible instance if required.

---

Outside diff comments:
In
`@plugins/mapper-annotated-text/src/main/java/org/opensearch/search/fetch/subphase/highlight/AnnotatedTextHighlighter.java`:
- Line 38: The import org.opensearch.index.mapper.DocumentMapper in
AnnotatedTextHighlighter is unused (the getAnalyzer(DocumentMapper) override was
removed); delete that import line and run organize-imports / recompile to ensure
no other references remain so the class compiles cleanly.

In
`@server/src/main/java/org/opensearch/search/fetch/subphase/highlight/UnifiedHighlighter.java`:
- Around line 211-213: The getAnalyzer(DocumentMapper) method is dead code and
should be removed or clearly deprecated to avoid future subclass confusion:
either delete the protected getAnalyzer(DocumentMapper) method from
UnifiedHighlighter (and remove any unused imports/tests) or mark it `@Deprecated`,
change visibility to private or final if keeping, and update its Javadoc to
point callers to wrapAnalyzer; ensure you reference getAnalyzer,
buildHighlighter (which currently reads the index analyzer directly), and the
subclasses CustomUnifiedHighlighter and AnnotatedTextHighlighter when making the
change so you don’t break expected behavior.

Comment on lines +164 to 169
Analyzer analyzer = fieldContext.context.mapperService().documentMapper().mappers().indexAnalyzer();
analyzer = wrapAnalyzer(analyzer, fieldMaxAnalyzedOffset);
MappedFieldType derivedFieldType = fieldContext.context.getQueryShardContext().resolveDerivedFieldType(fieldContext.fieldName);
if (derivedFieldType != null) {
analyzer = ((DerivedFieldType) derivedFieldType).getIndexAnalyzer();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

wrapAnalyzer result is discarded when a derived field type is present.

wrapAnalyzer at line 165 applies offset limiting, but the derived-field override at line 168 replaces the analyzer unconditionally, silently ignoring the fieldMaxAnalyzedOffset setting for derived fields. For AnnotatedTextHighlighter, the replaced analyzer is also not an AnnotatedHighlighterAnalyzer, which would cause a ClassCastException in loadFieldValues (line 73 of AnnotatedTextHighlighter) — although that specific issue pre-dates this PR.

Consider applying wrapAnalyzer after the derived-field type resolution:

🔧 Proposed fix
-        Analyzer analyzer = fieldContext.context.mapperService().documentMapper().mappers().indexAnalyzer();
-        analyzer = wrapAnalyzer(analyzer, fieldMaxAnalyzedOffset);
         MappedFieldType derivedFieldType = fieldContext.context.getQueryShardContext().resolveDerivedFieldType(fieldContext.fieldName);
+        Analyzer analyzer = fieldContext.context.mapperService().documentMapper().mappers().indexAnalyzer();
         if (derivedFieldType != null) {
             analyzer = ((DerivedFieldType) derivedFieldType).getIndexAnalyzer();
         }
+        analyzer = wrapAnalyzer(analyzer, fieldMaxAnalyzedOffset);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@server/src/main/java/org/opensearch/search/fetch/subphase/highlight/UnifiedHighlighter.java`
around lines 164 - 169, In UnifiedHighlighter, the analyzer returned by
wrapAnalyzer is being overwritten when a derived field type exists; move the
call to wrapAnalyzer so it is applied after resolving derivedFieldType (i.e.,
resolve derivedFieldType via
context.getQueryShardContext().resolveDerivedFieldType(fieldContext.fieldName),
select the analyzer from either the derived field
((DerivedFieldType).getIndexAnalyzer()) or the original indexAnalyzer, and then
call wrapAnalyzer on that chosen analyzer using fieldMaxAnalyzedOffset) so the
offset-limiting wrapper is never discarded; also ensure that when
AnnotatedTextHighlighter will later cast the analyzer (see
AnnotatedTextHighlighter.loadFieldValues) the wrapped analyzer preserves/returns
an AnnotatedHighlighterAnalyzer-compatible instance if required.

@github-actions
Copy link
Contributor

❌ Gradle check result for 9b469e7: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: sjs004 <simarjeet.singh004@gmail.com>
@github-actions
Copy link
Contributor

✅ Gradle check result for 9a7edd7: SUCCESS

@codecov
Copy link

codecov bot commented Feb 19, 2026

Codecov Report

❌ Patch coverage is 42.85714% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.26%. Comparing base (6b557db) to head (9a7edd7).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
...h/subphase/highlight/AnnotatedTextHighlighter.java 0.00% 2 Missing ⚠️
...h/fetch/subphase/highlight/UnifiedHighlighter.java 60.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20682      +/-   ##
============================================
- Coverage     73.32%   73.26%   -0.07%     
+ Complexity    72064    72000      -64     
============================================
  Files          5781     5781              
  Lines        329395   329405      +10     
  Branches      47525    47525              
============================================
- Hits         241536   241333     -203     
- Misses        68507    68787     +280     
+ Partials      19352    19285      -67     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Search:Query Capabilities

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Highlighting issue with annotated highlighter and max_analyzer_offset option

1 participant