Skip to content

Conversation

@morningman
Copy link
Contributor

@morningman morningman commented Feb 12, 2026

What problem does this PR solve?

Why do we still need this feature when OUTFILE already exists?

OUTFILE itself is a MySQL-specific syntax.
We should standardize all data access patterns: use SELECT for reading and INSERT for writing.

Since a TVF is treated as a table, it should support being written to via INSERT.

From a functionality perspective, INSERT INTO tvf is currently similar to OUTFILE.
However, from the standpoint of conceptual consistency, we need to support INSERT INTO tvf.

Key changes:

Add support for INSERT INTO TVF (Table-Valued Function) syntax, allowing users
to directly export query results into external file systems (local, HDFS, S3)
in CSV, Parquet, and ORC formats.

  • FE: Add ANTLR grammar rule for INSERT INTO TVF syntax, implement
    UnboundTVFTableSink, LogicalTVFTableSink, PhysicalTVFTableSink plan nodes,
    and InsertIntoTVFCommand for query planning and execution.
  • BE: Add TVFTableSinkOperator for pipeline execution, VTVFTableWriter for
    async file writing with auto-split support, and VFileFormatTransformerFactory
    for creating CSV/Parquet/ORC format transformers.
  • Support CSV options: column_separator, line_delimiter, compression (gz/zstd/lz4/snappy).
  • Support append mode (default) with file-prefix naming ({prefix}{query_id}_{idx}.{ext}).
  • Add error handling for missing required params, unsupported formats, wildcards
    in file_path, and delete_existing_files on local TVF.

Example SQL:

-- Export query results to local BE node as CSV
INSERT INTO local(
    "file_path" = "/tmp/export/basic_csv_",
    "backend_id" = "10001",
    "format" = "csv"
) SELECT * FROM my_table ORDER BY id;

-- Export as Parquet to HDFS
INSERT INTO hdfs(
    "file_path" = "/tmp/test_insert_into_hdfs_tvf/complex_parquet/data_",
    "format" = "parquet",
    "hadoop.username" = "doris",
    "fs.defaultFS" = "hdfs://127.0.0.1:8020",
    "delete_existing_files" = "true"
) SELECT * FROM insert_tvf_complex_src ORDER BY c_int;

-- Export ORC to s3
INSERT INTO s3(
    "uri" = "https://bucket/insert_tvf_test/basic_orc/*",
    "s3.access_key" = "ak",
    "s3.secret_key" = "sk",
    "format" = "orc",
    "region" = "region"
) SELECT c_int, c_varchar, c_string FROM my_table WHERE c_int IS NOT NULL ORDER BY c_int;

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Feb 12, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morningman
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/220) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.70% (19500/37002)
Line Coverage 36.24% (181735/501484)
Region Coverage 32.53% (140721/432526)
Branch Coverage 33.62% (61109/181739)

@morningman morningman changed the title [feat](tvf) support insert into tvf [feat](tvf) Support INSERT INTO TVF to export query results to local/HDFS/S3 files Feb 13, 2026
@morningman
Copy link
Contributor Author

run buildall

@morningman morningman marked this pull request as ready for review February 13, 2026 03:01
@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.32% (1795/2263)
Line Coverage 64.80% (31955/49311)
Region Coverage 65.52% (15947/24339)
Branch Coverage 56.01% (8476/15132)

Copy link
Contributor

@zhangstar333 zhangstar333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some minor nit.
and in Example SQL: need a local file mode

_current_written_bytes = _vfile_writer->written_len();

// Auto-split if max file size is set
if (_max_file_size_bytes > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems no need this if check, have done in _create_new_file_if_exceed_size() function


// Set hadoop config for hdfs/s3 (BE uses this for file writer creation)
if (!tvfName.equals("local")) {
tSink.setHadoopConfig(backendConnectProps);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tSink.setProperties(backendConnectProps);
seems properties is also use backendConnectProps?

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 13, 2026
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/198) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.66% (19487/37002)
Line Coverage 36.22% (181611/501459)
Region Coverage 32.53% (140685/432482)
Branch Coverage 33.60% (61049/181717)

@doris-robot
Copy link

TPC-H: Total hot run time: 29189 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 884ce04912a6c377656267a64b2c5b54ff9e08f8, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17619	4499	4322	4322
q2	q3	10659	820	541	541
q4	4678	374	263	263
q5	7555	1203	1012	1012
q6	172	176	148	148
q7	775	840	666	666
q8	9295	1489	1358	1358
q9	4831	4834	4757	4757
q10	6835	1869	1652	1652
q11	456	277	242	242
q12	714	565	467	467
q13	17783	4204	3433	3433
q14	232	236	210	210
q15	939	803	793	793
q16	771	740	691	691
q17	733	846	428	428
q18	5989	5386	5328	5328
q19	1232	985	639	639
q20	510	500	399	399
q21	4685	1995	1561	1561
q22	371	328	279	279
Total cold run time: 96834 ms
Total hot run time: 29189 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4701	4537	4528	4528
q2	q3	1812	2248	1795	1795
q4	1048	1204	761	761
q5	4067	4399	4329	4329
q6	217	175	145	145
q7	1792	1615	1548	1548
q8	2525	2810	2573	2573
q9	7523	7387	7425	7387
q10	2615	2806	2399	2399
q11	515	455	416	416
q12	486	594	458	458
q13	3962	4510	3676	3676
q14	290	299	278	278
q15	920	824	806	806
q16	712	786	704	704
q17	1217	1594	1294	1294
q18	7055	6821	6641	6641
q19	969	978	880	880
q20	2095	2148	2013	2013
q21	3911	3496	3484	3484
q22	527	483	423	423
Total cold run time: 48959 ms
Total hot run time: 46538 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 185185 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 884ce04912a6c377656267a64b2c5b54ff9e08f8, data reload: false

query5	5212	640	512	512
query6	342	225	202	202
query7	4208	470	275	275
query8	339	247	231	231
query9	8670	2748	2736	2736
query10	561	394	337	337
query11	17252	17786	17512	17512
query12	207	144	132	132
query13	1330	475	374	374
query14	8016	3354	3190	3190
query14_1	2985	2920	2823	2823
query15	213	199	181	181
query16	1078	494	485	485
query17	1208	726	595	595
query18	2721	438	333	333
query19	208	201	175	175
query20	134	123	130	123
query21	212	133	116	116
query22	5035	5275	4954	4954
query23	17311	16882	16776	16776
query23_1	16766	16859	16682	16682
query24	6836	1601	1216	1216
query24_1	1245	1274	1227	1227
query25	557	442	413	413
query26	1244	256	146	146
query27	2789	484	317	317
query28	4416	1876	1830	1830
query29	790	543	467	467
query30	304	248	217	217
query31	880	726	640	640
query32	78	74	67	67
query33	507	332	282	282
query34	895	910	560	560
query35	634	676	585	585
query36	1092	1110	1021	1021
query37	131	91	84	84
query38	2913	2916	2842	2842
query39	848	840	806	806
query39_1	805	828	799	799
query40	229	152	137	137
query41	64	60	58	58
query42	103	101	103	101
query43	397	381	365	365
query44	
query45	200	187	181	181
query46	886	992	614	614
query47	2104	2153	2045	2045
query48	307	320	230	230
query49	617	450	370	370
query50	692	277	216	216
query51	4177	4051	4261	4051
query52	110	108	96	96
query53	285	341	285	285
query54	314	256	266	256
query55	86	82	79	79
query56	300	301	294	294
query57	1373	1339	1302	1302
query58	283	291	278	278
query59	2664	2707	2653	2653
query60	349	335	309	309
query61	144	143	147	143
query62	613	590	536	536
query63	317	275	270	270
query64	4823	1257	1010	1010
query65	
query66	1375	450	343	343
query67	16471	16562	16427	16427
query68	
query69	390	300	299	299
query70	951	972	937	937
query71	323	305	295	295
query72	2755	2817	2574	2574
query73	548	547	328	328
query74	9600	9608	9400	9400
query75	2852	2777	2492	2492
query76	2286	1036	695	695
query77	373	387	319	319
query78	11617	11774	11146	11146
query79	1201	789	615	615
query80	727	641	573	573
query81	513	289	255	255
query82	1312	157	116	116
query83	331	264	247	247
query84	263	126	108	108
query85	969	467	423	423
query86	367	303	303	303
query87	3104	3107	3007	3007
query88	3582	2670	2669	2669
query89	423	359	344	344
query90	1785	172	171	171
query91	163	161	133	133
query92	82	73	70	70
query93	931	840	507	507
query94	470	316	299	299
query95	572	397	308	308
query96	638	527	226	226
query97	2506	2497	2446	2446
query98	234	221	205	205
query99	1015	1003	915	915
Total cold run time: 255098 ms
Total hot run time: 185185 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.1.x kind/need-document reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants