branch-4.0: [feature](multi-catalog) Add max_file_split_num session variable to prevent OOM in file scan #58759 #60732

suxiaogang223 · 2026-02-13T02:57:01Z

Cherry-picked from [feature](multi-catalog) Add max_file_split_num session variable to prevent OOM in file scan #58759

…revent OOM in file scan (apache#58759) ### What problem does this PR solve? - Relate Pr: apache#58858 ## Problem Summary When querying external table catalog (Hive, Iceberg, Paimon, etc.), Doris splits files into multiple splits for parallel processing. In some cases, especially with numerous small files, this can generate an excessive number of splits, potentially causing: 1. **Memory pressure**: Too many splits consume significant memory in FE 2. **OOM issues**: Excessive split generation can lead to OutOfMemoryError 3. **Performance degradation**: Managing too many splits impacts query planning overhead Previously, there was no upper limit on the number of splits in non-batch mode, which could lead to problems when querying tables with many small files. ## Solution This PR introduces a new session variable `max_file_split_num` to limit the maximum number of splits allowed per table scan in non-batch mode. ### Changes 1. **New Session Variable**: `max_file_split_num` - Type: `int` - Default: `100000` - Description: "在非 batch 模式下，每个 table scan 最大允许的 split 数量，防止产生过多 split 导致 OOM。" - Forward to BE: `true` 2. **Implementation in FileQueryScanNode**: - Added method `applyMaxFileSplitNumLimit(long targetSplitSize, long totalFileSize)` - Dynamically calculates minimum split size to ensure split count doesn't exceed the limit - Formula: `minSplitSizeForMaxNum = (totalFileSize + maxFileSplitNum - 1) / maxFileSplitNum` - Returns: `Math.max(targetSplitSize, minSplitSizeForMaxNum)` 3. **Applied to multiple scan nodes**: - `HiveScanNode` - `IcebergScanNode` - `PaimonScanNode` - `TVFScanNode` 4. **Unit Tests**: - `FileQueryScanNodeTest`: Test base logic - `HiveScanNodeTest`: Test Hive-specific implementation - `IcebergScanNodeTest`: Test Iceberg-specific implementation - `PaimonScanNodeTest`: Test Paimon-specific implementation - `TVFScanNodeTest`: Test TVF-specific implementation ## Usage Users can now control the maximum number of splits per table scan by setting the session variable: ```sql -- Set to 50000 splits maximum SET max_file_split_num = 50000; -- Disable the limit (set to 0 or negative) SET max_file_split_num = 0; ``` (cherry picked from commit 3e5a70f)

hello-stephen · 2026-02-13T02:57:07Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

suxiaogang223 · 2026-02-13T03:32:47Z

run buildall

suxiaogang223 · 2026-02-13T04:18:28Z

run buildall

suxiaogang223 · 2026-02-13T04:22:10Z

run buildall

suxiaogang223 requested a review from yiguolei as a code owner February 13, 2026 02:57

[fix](ut) Adapt FileQueryScanNodeTest constructor args for branch-4.0

75641e0

suxiaogang223 force-pushed the pick-58759-branch-4.0-v2 branch from 4f944e4 to 75641e0 Compare February 13, 2026 04:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

branch-4.0: [feature](multi-catalog) Add max_file_split_num session variable to prevent OOM in file scan #58759 #60732

branch-4.0: [feature](multi-catalog) Add max_file_split_num session variable to prevent OOM in file scan #58759 #60732

suxiaogang223 commented Feb 13, 2026

Uh oh!

hello-stephen commented Feb 13, 2026

Uh oh!

suxiaogang223 commented Feb 13, 2026

Uh oh!

suxiaogang223 commented Feb 13, 2026

Uh oh!

suxiaogang223 commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

branch-4.0: [feature](multi-catalog) Add max_file_split_num session variable to prevent OOM in file scan #58759 #60732

Are you sure you want to change the base?

branch-4.0: [feature](multi-catalog) Add max_file_split_num session variable to prevent OOM in file scan #58759 #60732

Conversation

suxiaogang223 commented Feb 13, 2026

Uh oh!

hello-stephen commented Feb 13, 2026

Uh oh!

suxiaogang223 commented Feb 13, 2026

Uh oh!

suxiaogang223 commented Feb 13, 2026

Uh oh!

suxiaogang223 commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants