Skip to content

Add cache initialization for existing files in DatafusionEngine#20645

Open
cocosz wants to merge 1 commit intoopensearch-project:feature/datafusionfrom
cocosz:feature/cache-initialization-existing-files
Open

Add cache initialization for existing files in DatafusionEngine#20645
cocosz wants to merge 1 commit intoopensearch-project:feature/datafusionfrom
cocosz:feature/cache-initialization-existing-files

Conversation

@cocosz
Copy link

@cocosz cocosz commented Feb 17, 2026

Description

This PR adds eager cache initialization for existing files in the DatafusionEngine constructor. Previously, files were only added to the cache during refresh operations, which meant the first query after engine initialization would experience cache misses. This change improves query performance by pre-populating the cache with existing files at initialization time.

Benefits

  • Improved First Query Performance: Eliminates cache misses on the first query after engine initialization
  • 100% Cache Hit Rate: Achieves optimal cache hit rate from the first query onwards
  • Better Resource Utilization: Ensures cache is populated and ready before queries are executed

Testing

  • Added testExistingFilesAddedToCacheOnInitialization() test that verifies:

    • Cache memory increases after engine initialization
    • All existing files are present in the cache
    • Test passes successfully with proper file setup
  • Tested with Clickbench dataset showing:

    • Existing files loaded at initialization
    • Queries achieves 100% hit rate

@cocosz cocosz requested a review from a team as a code owner February 17, 2026 13:40
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 17, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Tanvir Alam <tanvralm@amazon.com>
@cocosz cocosz force-pushed the feature/cache-initialization-existing-files branch from 70d5573 to c9018ca Compare February 17, 2026 13:43
@github-actions
Copy link
Contributor

❌ Gradle check result for c9018ca: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments