Skip to content

[SPARK-55855][SQL] DSv2 Transaction API foundations#54642

Open
andreaschat-db wants to merge 1 commit intoapache:masterfrom
andreaschat-db:dsv2TransactionAPIFoundations
Open

[SPARK-55855][SQL] DSv2 Transaction API foundations#54642
andreaschat-db wants to merge 1 commit intoapache:masterfrom
andreaschat-db:dsv2TransactionAPIFoundations

Conversation

@andreaschat-db
Copy link

What changes were proposed in this pull request?

Currently, DSv2 is lacking the required abstractions for allowing transactionability in DML operations. This PR introduces the public Java interfaces that connectors need to implement. In particular, these are the following:

  • TransactionInfo — carries the transaction metadata.
  • Transaction — represents a transaction. Exposes catalog(), commit(), abort(), and close().
  • TransactionalCatalogPlugin — extends CatalogPlugin with beginTransaction(TransactionInfo) method.

This is the first in a series of PRs adding transaction support to DSv2. This PR is based on @aokolnychyi's prototype.

Why are the changes needed?

We are currently lacking the required abstractions for DSv2 connectors to implement transactions in write operations.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added a new suite to test TransactionUtils.

Was this patch authored or co-authored using generative AI tooling?

Claude Sonnet 4.6.

@HyukjinKwon HyukjinKwon changed the title [SPARK-55855] DSv2 Transaction API foundations [SPARK-55855][SQL] DSv2 Transaction API foundations Mar 6, 2026
* The connector is responsible for detecting and resolving conflicting commits or throwing
* an exception if resolution is not possible.
* <p>
* This method must be called exactly once. Spark calls {@link #close()} immediately after
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Must be called exactly once" sounds like it is controlled by connector. I think a better way to say "will be called exactly once".

I am also not sure about the last sentence on releasing resources. Instead of saying "should not release resources", I think we better describe the sequencing of calls so that connectors make an informed call on their end when to do what.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants