-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[feat](tvf) Support INSERT INTO TVF to export query results to local/HDFS/S3 files #60719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
93ff2fc to
297e20f
Compare
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just some minor nit.
and in Example SQL: need a local file mode
| _current_written_bytes = _vfile_writer->written_len(); | ||
|
|
||
| // Auto-split if max file size is set | ||
| if (_max_file_size_bytes > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems no need this if check, have done in _create_new_file_if_exceed_size() function
|
|
||
| // Set hadoop config for hdfs/s3 (BE uses this for file writer creation) | ||
| if (!tvfName.equals("local")) { | ||
| tSink.setHadoopConfig(backendConnectProps); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tSink.setProperties(backendConnectProps);
seems properties is also use backendConnectProps?
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 29189 ms |
TPC-DS: Total hot run time: 185185 ms |
What problem does this PR solve?
Why do we still need this feature when
OUTFILEalready exists?OUTFILEitself is a MySQL-specific syntax.We should standardize all data access patterns: use
SELECTfor reading andINSERTfor writing.Since a TVF is treated as a table, it should support being written to via
INSERT.From a functionality perspective,
INSERT INTO tvfis currently similar toOUTFILE.However, from the standpoint of conceptual consistency, we need to support
INSERT INTO tvf.Key changes:
Add support for INSERT INTO TVF (Table-Valued Function) syntax, allowing users
to directly export query results into external file systems (local, HDFS, S3)
in CSV, Parquet, and ORC formats.
UnboundTVFTableSink, LogicalTVFTableSink, PhysicalTVFTableSink plan nodes,
and InsertIntoTVFCommand for query planning and execution.
async file writing with auto-split support, and VFileFormatTransformerFactory
for creating CSV/Parquet/ORC format transformers.
in file_path, and delete_existing_files on local TVF.
Example SQL:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)