Skip to content

fix: fix Scanner file handle leak in HiveIncrementalPuller.executeIncrementalSQL#18457

Merged
danny0405 merged 3 commits intoapache:masterfrom
mailtoboggavarapu-coder:fix/hive-incremental-puller-scanner-leak-18440
Apr 16, 2026
Merged

fix: fix Scanner file handle leak in HiveIncrementalPuller.executeIncrementalSQL#18457
danny0405 merged 3 commits intoapache:masterfrom
mailtoboggavarapu-coder:fix/hive-incremental-puller-scanner-leak-18440

Conversation

@mailtoboggavarapu-coder
Copy link
Copy Markdown
Contributor

@mailtoboggavarapu-coder mailtoboggavarapu-coder commented Apr 3, 2026

Describe the issue this Pull Request addresses

In HiveIncrementalPuller.executeIncrementalSQL(), a java.util.Scanner used to read the incremental SQL file is never explicitly closed. On every invocation this leaks a file descriptor, which can exhaust OS file descriptor limits in long-running jobs.

Summary and Changelog

Fixed a Scanner resource leak in HiveIncrementalPuller.executeIncrementalSQL() by wrapping the Scanner in a try-with-resources block so it is always closed after use.

  • HiveIncrementalPuller.java: Converted plain new Scanner(...) to a try-with-resources statement, ensuring the Scanner is closed regardless of whether an exception is thrown during file reading.

Impact

No public API or user-facing change. Prevents file descriptor exhaustion in long-running incremental pull pipelines.

Risk Level

low — Only affects resource cleanup; no functional query or SQL parsing logic changed.

Documentation Update

none

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

Copy link
Copy Markdown
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

LGTM — clean, minimal fix that correctly addresses the file handle leak using try-with-resources. The Scanner is now guaranteed to close on both normal exit and exception paths, with no change to observable behavior.

@github-actions github-actions bot added the size:XS PR with lines of changes in <= 10 label Apr 4, 2026
@mailtoboggavarapu-coder
Copy link
Copy Markdown
Contributor Author

Pinging for committer approval. This PR fixes a Scanner file handle leak in HiveIncrementalPuller.executeIncrementalSQL (HUDI-18440). @yihua has already reviewed and LGTM'd this. Would appreciate a review and merge from a committer with write access to apache/hudi (@vinothchandrasekar / @alexeykudinkin / @danny0405 / @nsivabalan).

@mailtoboggavarapu-coder
Copy link
Copy Markdown
Contributor Author

CI Build Failures — Master Branch Issue (not this PR)

The Java CI build failures on this PR are not caused by our code changes. Investigation shows this is a master branch build issue introduced by commit d3e0201 ("fix(common): FutureUtils:allOf should always throw root cause exception", merged 2026-04-15T22:30Z).

Evidence:

  • The master branch CI run for d3e0201 (run 24481746190) shows the same "Build Project" step failures in test-common-and-other-modules and test-spark-java-tests-part2 jobs
  • Our code changes touch completely unrelated files (DFSPropertiesConfiguration.java / HiveIncrementalPuller.java / FileSystemBasedLockProvider.java / SqlFileBasedSource.java) — none of which interact with FutureUtils
  • The last successful PR CI run before d3e0201 (nsivabalan's run at 21:53 UTC same day) passed all 53/53 jobs with identical matrix configuration
  • Build failures occur in ~83–111 seconds (a full Maven build normally takes 340s+), consistent with an early Maven resolution/plugin failure rather than a compilation error in our code

The CI situation is being tracked. Once the master build stabilises, a re-run of CI on this PR should pass.

cc @vinothchandrasekar @alexeykudinkin @danny0405 @nsivabalan — would appreciate a re-run once the master build issue is resolved.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.85%. Comparing base (d3e0201) to head (2d21b95).

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18457      +/-   ##
============================================
+ Coverage     66.50%   68.85%   +2.34%     
- Complexity    22910    28222    +5312     
============================================
  Files          2004     2460     +456     
  Lines        112236   135260   +23024     
  Branches      14250    16395    +2145     
============================================
+ Hits          74645    93129   +18484     
- Misses        31087    34758    +3671     
- Partials       6504     7373     +869     
Flag Coverage Δ
common-and-other-modules 44.59% <ø> (?)
hadoop-mr-java-client 44.88% <ø> (-0.01%) ⬇️
spark-client-hadoop-common 48.45% <ø> (+<0.01%) ⬆️
spark-java-tests 48.93% <ø> (+0.06%) ⬆️
spark-scala-tests 45.51% <ø> (-0.01%) ⬇️
utilities 38.23% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.
see 802 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hudi-bot
Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405 danny0405 changed the title [HUDI-18440] Fix Scanner file handle leak in HiveIncrementalPuller.executeIncrementalSQL fix: fix Scanner file handle leak in HiveIncrementalPuller.executeIncrementalSQL Apr 16, 2026
@danny0405 danny0405 merged commit 5b68607 into apache:master Apr 16, 2026
58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XS PR with lines of changes in <= 10

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants