spec: drop table under ns #53

yanghua · 2025-04-22T14:37:35Z

close #49

jackye1995 · 2025-04-22T22:30:45Z

spec/catalog.yaml

+      tags: [ Table ]
+      summary: Drop a table from the catalog
+      operationId: DropTable
+      parameters:


I was browsing through the history of IRC about this because I remember there were quite some controversies about this feature. I think there are 2 big reasons in summary,

(1) there is basically no way for the server to actually purge the data and give a signal back to the client that the purge operation actually succeeded. Because purge can take a long time, but this API has to return right away, even if there is a background job that is purging data, if that process fails half way (which happens a lot due to issues like permission), there is no way to report back to the client that the purge process has failed and some actions need to be taken.

There is a way to solve this issue, which is to introduce a transaction API for progress tracking, so the client can detect any issue with purging through that API. I think this would also be helpful for DropNamespace to allow cascade drop. If you agree with this direction, I can open a PR to add that API.

(2) in Iceberg, there is a property in the table that indicates the same thing called gc.enabled. And users expect that table property to dictate if the table should be purged or not during a drop table and it might get inconsistent with the API request (e.g. the table says the table needs to be purged, but user does not request purge). I don't know if there is a similar situation in Lance that we need to consider here.

I don't know if there is a similar situation in Lance that we need to consider here.

AFAIK, Lance only follows the physical delete semantics when dropping a dataset.

There is a way to solve this issue, which is to introduce a transaction API for progress tracking, so the client can detect any issue with purging through that API.

You mean an async API?

I mean, there may be two requirements to make the async take effect:

in catalog server(adapter service), we support async delete logic(e.g. delete ns as you said)

For deleting a table, the lance SDK must support async delete logic

WDYT?

only follows the physical delete semantics when dropping a dataset.

what do you mean by physical delete? You mean delete the entire dataset including data?

You mean an async API?

yes in general async API for the server, so operations like DropTable, DropNamespace could return a 202 accepted with an ID to check progress.

For a catalog, this ID is basically a transaction ID, which means to introduce APIs like

GET /transactions/{tid} to check for transaction progress

DELETE /transactions/{tid} to cancel an ongoing transaction

For deleting a table, the lance SDK must support async delete logic

what do you mean by that? You mean Lance SDK deletes the dataset in storage in Spark, and call the catalog to just drop the table reference?

what do you mean by physical delete? You mean delete the entire dataset including data?

Oh... sorry, this is a wrong description, I want to say sync delete.

what do you mean by that? You mean Lance SDK deletes the dataset in storage in Spark, and call the catalog to just drop the table reference?

I mean, currently, Lance only has a sync delete API, which only triggers deletion via object store. For deleting a lance table, is it not enough?

sorry somehow missed responding to this thread...

yeah but the issue here is when the deletion is moved to the server side, the server is responsible for running this sync delete, and that will likely trigger a timeout on client side if the API is sync.

Also added a prototype for the transactions API under the context of DropNamespace first to see how that would look like in #59

accidentally deleted the original fork which closed the PR... I reopened a new one in #63

spec: drop table under ns

d767fed

yanghua requested a review from jackye1995 April 22, 2025 14:37

github-actions bot added enhancement New feature or request python Python features java Java features spec Restful openapi spec rust Rust features labels Apr 22, 2025

jackye1995 reviewed Apr 22, 2025

View reviewed changes

yanghua closed this Apr 30, 2025

jackye1995 mentioned this pull request May 21, 2025

Deregister Table vs Drop Table Purge #73

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

spec: drop table under ns #53

spec: drop table under ns #53

Uh oh!

yanghua commented Apr 22, 2025

Uh oh!

jackye1995 Apr 22, 2025 •

edited

Loading

Uh oh!

yanghua Apr 23, 2025

Uh oh!

jackye1995 Apr 23, 2025

Uh oh!

yanghua Apr 24, 2025

Uh oh!

jackye1995 Apr 27, 2025

Uh oh!

jackye1995 Apr 27, 2025 •

edited

Loading

Uh oh!

jackye1995 Apr 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

spec: drop table under ns #53

spec: drop table under ns #53

Uh oh!

Conversation

yanghua commented Apr 22, 2025

Uh oh!

jackye1995 Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yanghua Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

jackye1995 Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

yanghua Apr 24, 2025

Choose a reason for hiding this comment

Uh oh!

jackye1995 Apr 27, 2025

Choose a reason for hiding this comment

Uh oh!

jackye1995 Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jackye1995 Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jackye1995 Apr 22, 2025 •

edited

Loading

jackye1995 Apr 27, 2025 •

edited

Loading