Skip to content

As a user, I want the exists operator to match OpenSearch's native behavior for all fieldsΒ #712

@tloubrieu-jpl

Description

@tloubrieu-jpl

Checked for duplicates

Yes - I've already checked

πŸ§‘β€πŸ”¬ User Persona(s)

Data User

πŸ’ͺ Motivation

...so that the exists operator behaves consistently with OpenSearch's native functionality, providing predictable and intuitive results when querying for fields regardless of whether they exist in the schema.

πŸ“– Additional Details

This requirement extends the exists operator implementation from #406 to ensure the API behavior matches OpenSearch's native behavior for all field existence queries.

Context:
Following the completion of #406 in PR #700, additional behavior clarifications were identified in the PR review discussions (#700) regarding how to handle fields that don't exist in the registry schema.

Problem:
The API should provide consistent, predictable behavior that matches OpenSearch's native functionality. Users familiar with OpenSearch expect certain behavior when querying for field existence, and the API should honor those expectations.

Proposed Behavior:

The API should match OpenSearch's native behavior for all exists queries:

  1. Field exists in schema with matching documents:

    pds:Internal_Reference.reference_type exists
    
    • Returns only documents where this field has a value
    • Standard expected behavior
  2. Field exists in schema with no matching documents:

    pds:Some_Real_Field_With_No_Values exists
    
    • Returns 0 results (no documents have this field populated)
    • Consistent with OpenSearch behavior
  3. Field does NOT exist in schema:

    pds:Nonexistent_Field exists
    
    • Returns 0 results (OpenSearch treats non-existent fields as having no values)
    • Matches OpenSearch native behavior
  4. NOT exists with field in schema:

    not (pds:Internal_Reference.reference_type exists)
    
    • Returns documents where this field does NOT have a value
    • Standard expected behavior
  5. NOT exists with field NOT in schema:

    not (pds:Nonexistent_Field exists)
    
    • Returns all documents (since the field doesn't exist in any document)
    • Matches OpenSearch native behavior
  6. Regex pattern matching fields:

    "pds:Internal_Reference.*" exists
    
    • Returns documents where ANY matching field has a value
    • Standard expected behavior

Acceptance Criteria

Given a user queries for an exact field name that exists in the OpenSearch mapping with documents containing values
When the user uses the exists operator (e.g., pds:Internal_Reference.reference_type exists)
Then the API returns only documents where this field has a value

Given a user queries for an exact field name that exists in the OpenSearch mapping but no documents have values for it
When the user uses the exists operator (e.g., pds:Some_Real_Field_With_No_Values exists)
Then the API returns 0 results

Given a user queries for an exact field name that does NOT exist in the OpenSearch mapping
When the user uses the exists operator (e.g., pds:Nonexistent_Field exists)
Then the API returns 0 results (matching OpenSearch behavior where non-existent fields are treated as having no values)

Given a user queries for an exact field name that exists in the OpenSearch mapping
When the user uses the not exists operator (e.g., not (pds:Internal_Reference.reference_type exists))
Then the API returns only documents where this field does NOT have a value

Given a user queries for an exact field name that does NOT exist in the OpenSearch mapping
When the user uses the not exists operator (e.g., not (pds:Nonexistent_Field exists))
Then the API returns all documents (matching OpenSearch behavior - since the field doesn't exist anywhere, it "doesn't exist" in all documents)

Given a user queries with a regex pattern that matches one or more fields
When the user uses the exists operator with a quoted pattern (e.g., "pds:Internal_Reference.*" exists)
Then the API returns documents where ANY of the matching fields have values

Given the API processes exists and not exists queries for any field (existent or non-existent)
When the query is executed
Then the behavior matches OpenSearch's native behavior exactly

βš™οΈ Engineering Details

To be filled by engineering team during implementation

Considerations:

  • Ensure the implementation delegates field existence logic to OpenSearch rather than pre-validating against the mapping
  • Test with fields that exist vs. don't exist in the schema
  • Test with fields that exist in schema but have no values in any documents
  • Verify not exists behavior for both existent and non-existent fields
  • Ensure regex patterns work correctly

πŸŽ‰ I&T

To be filled by engineering team during implementation

Integration tests should verify:

  • Queries for existent fields with values return correct results
  • Queries for existent fields with no values return 0 results
  • Queries for non-existent fields return 0 results (matching OpenSearch)
  • NOT exists queries for non-existent fields return all documents (matching OpenSearch)
  • Regex patterns matching multiple fields return documents with ANY matching field
  • All behaviors match OpenSearch native functionality

Metadata

Metadata

Assignees

Type

Projects

Status

ToDo

Status

Review/QA

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions