-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[feat](inverted index) skip .nrm generation for non-tokenized indexes #60722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[feat](inverted index) skip .nrm generation for non-tokenized indexes #60722
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
9b11037 to
1910234
Compare
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
zzzxl1993
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
TPC-H: Total hot run time: 28665 ms |
TPC-DS: Total hot run time: 184505 ms |
airborne12
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
When inspecting the composition of inverted index files, it was observed that non-tokenized indexes still generate .nrm files of around 2MB each. If there are many inverted indexes but only a few are tokenized, this behavior leads to significant unnecessary storage consumption.
Improvement:
Modify the .nrm file generation logic to only create .nrm files for indexes that require tokenization. Non-tokenized indexes will no longer generate .nrm files, reducing storage overhead without affecting functionality.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)