Skip to content

Conversation

@hoshinojyunn
Copy link
Contributor

@hoshinojyunn hoshinojyunn commented Feb 12, 2026

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:
When inspecting the composition of inverted index files, it was observed that non-tokenized indexes still generate .nrm files of around 2MB each. If there are many inverted indexes but only a few are tokenized, this behavior leads to significant unnecessary storage consumption.

Improvement:
Modify the .nrm file generation logic to only create .nrm files for indexes that require tokenization. Non-tokenized indexes will no longer generate .nrm files, reducing storage overhead without affecting functionality.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hoshinojyunn
Copy link
Contributor Author

run buildall

@hoshinojyunn hoshinojyunn force-pushed the norm_file_generate_behavior_fixed branch from 9b11037 to 1910234 Compare February 12, 2026 13:06
@hoshinojyunn
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (5/5) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.69% (19486/36984)
Line Coverage 36.22% (181551/501268)
Region Coverage 32.57% (140763/432139)
Branch Coverage 33.61% (61049/181619)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (5/5) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.75% (26004/36240)
Line Coverage 54.39% (271974/500031)
Region Coverage 51.83% (226261/436519)
Branch Coverage 53.32% (97209/182323)

Copy link
Contributor

@zzzxl1993 zzzxl1993 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 13, 2026
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 28665 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1910234fb3db458909a003df9e82cf1de2418734, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17621	4460	4333	4333
q2	q3	10656	778	511	511
q4	4677	357	253	253
q5	7531	1200	1025	1025
q6	182	180	149	149
q7	778	831	667	667
q8	9311	1468	1316	1316
q9	4911	4715	4668	4668
q10	6804	1878	1625	1625
q11	470	257	243	243
q12	704	570	469	469
q13	17782	4203	3411	3411
q14	222	227	221	221
q15	975	798	790	790
q16	750	715	668	668
q17	724	900	402	402
q18	5873	5338	5258	5258
q19	1232	976	638	638
q20	515	494	391	391
q21	4710	1840	1381	1381
q22	343	296	246	246
Total cold run time: 96771 ms
Total hot run time: 28665 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4458	4360	4342	4342
q2	q3	1767	2158	1728	1728
q4	862	1172	743	743
q5	4003	4353	4301	4301
q6	178	175	143	143
q7	1715	1625	1483	1483
q8	2401	2683	2546	2546
q9	7375	7405	7400	7400
q10	2672	2914	2428	2428
q11	533	478	425	425
q12	523	596	470	470
q13	4051	4432	3702	3702
q14	308	295	278	278
q15	859	833	805	805
q16	714	793	715	715
q17	1190	1546	1304	1304
q18	6962	6688	6609	6609
q19	916	883	892	883
q20	2125	2232	2059	2059
q21	4026	3660	3414	3414
q22	519	480	425	425
Total cold run time: 48157 ms
Total hot run time: 46203 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184505 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1910234fb3db458909a003df9e82cf1de2418734, data reload: false

query5	4755	636	525	525
query6	334	226	217	217
query7	4230	481	269	269
query8	339	240	235	235
query9	8739	2731	2734	2731
query10	524	375	351	351
query11	17130	17095	16770	16770
query12	193	124	125	124
query13	1261	460	352	352
query14	6439	3231	2964	2964
query14_1	2837	2835	2775	2775
query15	201	194	182	182
query16	969	473	446	446
query17	1073	712	611	611
query18	2606	444	342	342
query19	212	207	184	184
query20	142	128	132	128
query21	226	147	125	125
query22	5455	5546	5542	5542
query23	17647	17264	17044	17044
query23_1	17214	17087	16961	16961
query24	7170	1619	1225	1225
query24_1	1229	1236	1223	1223
query25	536	459	405	405
query26	1231	260	154	154
query27	2774	482	294	294
query28	4463	1858	1866	1858
query29	806	558	465	465
query30	334	243	210	210
query31	882	708	633	633
query32	80	68	65	65
query33	513	337	278	278
query34	901	913	570	570
query35	621	679	589	589
query36	1048	1134	933	933
query37	136	96	81	81
query38	2938	2959	2868	2868
query39	845	834	837	834
query39_1	932	794	816	794
query40	227	151	135	135
query41	64	61	57	57
query42	101	99	97	97
query43	369	374	346	346
query44	
query45	201	186	176	176
query46	888	986	607	607
query47	2125	2126	2017	2017
query48	340	321	239	239
query49	623	463	374	374
query50	692	292	214	214
query51	4165	4257	4103	4103
query52	105	106	101	101
query53	296	343	292	292
query54	298	264	256	256
query55	88	81	78	78
query56	318	315	294	294
query57	1361	1361	1253	1253
query58	284	268	292	268
query59	2575	2609	2516	2516
query60	324	317	322	317
query61	145	140	140	140
query62	625	619	553	553
query63	306	276	276	276
query64	4845	1261	970	970
query65	
query66	1400	456	347	347
query67	16464	16534	16331	16331
query68	
query69	424	316	276	276
query70	1009	982	918	918
query71	340	309	302	302
query72	2779	2564	2416	2416
query73	555	561	321	321
query74	9615	9531	9413	9413
query75	2821	2726	2452	2452
query76	2309	1026	701	701
query77	354	370	303	303
query78	11529	11667	11075	11075
query79	2802	837	616	616
query80	1796	641	561	561
query81	563	277	250	250
query82	1007	152	116	116
query83	345	271	244	244
query84	259	121	107	107
query85	950	552	418	418
query86	411	314	328	314
query87	3131	3083	2995	2995
query88	3568	2683	2624	2624
query89	423	370	339	339
query90	1999	175	165	165
query91	161	161	128	128
query92	75	74	66	66
query93	1158	847	516	516
query94	642	294	286	286
query95	585	398	315	315
query96	641	527	227	227
query97	2466	2494	2430	2430
query98	236	217	211	211
query99	1010	987	890	890
Total cold run time: 257046 ms
Total hot run time: 184505 ms

Copy link
Member

@airborne12 airborne12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (5/5) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.75% (26004/36240)
Line Coverage 54.40% (272009/500031)
Region Coverage 51.86% (226382/436519)
Branch Coverage 53.33% (97231/182323)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (5/5) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.75% (26002/36240)
Line Coverage 54.40% (271999/500031)
Region Coverage 51.84% (226276/436519)
Branch Coverage 53.32% (97214/182323)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (5/5) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.75% (26001/36240)
Line Coverage 54.40% (272000/500031)
Region Coverage 51.85% (226317/436519)
Branch Coverage 53.32% (97212/182323)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants