ORC: Add _row_id and _last_updated_sequence_number raeder in Orc to support lineage by Guosmilesmile · Pull Request #15776 · apache/iceberg

Guosmilesmile · 2026-03-26T15:27:39Z

While working on improving the TCK for File Format, we found that in V3 tables, we support lineage in Parquet and Avro, but we haven't implemented this feature in ORC.

This PR aims to add _row_id and _last_updated_sequence_number reader in ORC to support lineage.

This pr don't implemented lineage in spark vector read in ORC, it will support it in the follow pr.

…upport lineage

pvary · 2026-03-27T13:59:54Z

orc/src/main/java/org/apache/iceberg/orc/ORC.java

+                  MetadataColumns.ROW_ID.fieldId(),
+                  MetadataColumns.LAST_UPDATED_SEQUENCE_NUMBER.fieldId()));


These should be already in the META_IDS

Yes, META_IDS contains the row ID and last update sequence. The original code would delete all metadata-related fields, but in the lineage scenario, _row_id exists in the datafile and should not be removed. Therefore, we need to use difference to remove ROW_ID and LAST_UPDATED_SEQUENCE_NUMBER here.

orc/src/main/java/org/apache/iceberg/orc/OrcValueReaders.java

pvary · 2026-03-27T14:10:41Z

orc/src/main/java/org/apache/iceberg/orc/OrcValueReaders.java

+        OrcValueReader<Long> fileIdReader =
+            readerIndex < readerList.size()
+                ? (OrcValueReader<Long>) readerList.get(readerIndex)
+                : null;


Please help me understand why we do this

I understand that readerList represents physical columns, while ROW_ID/LAST_UPDATED_SEQUENCE_NUMBER may only exist in the logical projection. Although in my testing the counts are consistent, I cannot guarantee that there are no other scenarios where the projection and physical fields are inconsistent. So I added fileIdReader == null to fall back to the fallback path, which has a bit of a defensive programming flavor.

Guosmilesmile · 2026-03-27T15:34:02Z

Add ut for spark , this pr don't implemented lineage in spark vector read in ORC, it will support it in the follow pr.

github-actions bot added data flink ORC labels Mar 26, 2026

ORC: Add _row_id and _last_updated_sequence_number raeder in Orc to s…

d3ad6a3

…upport lineage

Guosmilesmile force-pushed the orc_rowid branch from f3a4c40 to d3ad6a3 Compare March 27, 2026 01:42

Guosmilesmile mentioned this pull request Mar 27, 2026

Data: Add TCK tests for Metadata Columns in BaseFormatModelTests #15675

Open

pvary reviewed Mar 27, 2026

View reviewed changes

orc/src/main/java/org/apache/iceberg/orc/OrcValueReaders.java Show resolved Hide resolved

pvary reviewed Mar 27, 2026

View reviewed changes

Guosmilesmile marked this pull request as draft March 27, 2026 15:01

add ut for spark

2582bb5

github-actions bot added the spark label Mar 27, 2026

Guosmilesmile marked this pull request as ready for review March 27, 2026 16:30

Guosmilesmile marked this pull request as draft March 27, 2026 16:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ORC: Add _row_id and _last_updated_sequence_number raeder in Orc to support lineage#15776

ORC: Add _row_id and _last_updated_sequence_number raeder in Orc to support lineage#15776
Guosmilesmile wants to merge 2 commits intoapache:mainfrom
Guosmilesmile:orc_rowid

Guosmilesmile commented Mar 26, 2026 •

edited

Loading

Uh oh!

pvary Mar 27, 2026 •

edited

Loading

Uh oh!

Guosmilesmile Mar 27, 2026

Uh oh!

Uh oh!

pvary Mar 27, 2026

Uh oh!

Guosmilesmile Mar 27, 2026

Uh oh!

Guosmilesmile commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		MetadataColumns.ROW_ID.fieldId(),
		MetadataColumns.LAST_UPDATED_SEQUENCE_NUMBER.fieldId()));

Conversation

Guosmilesmile commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pvary Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Guosmilesmile Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pvary Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Guosmilesmile Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Guosmilesmile commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Guosmilesmile commented Mar 26, 2026 •

edited

Loading

pvary Mar 27, 2026 •

edited

Loading