Skip to content

Conversation

@jackye1995
Copy link
Collaborator

Based on discussion on #5

@github-actions github-actions bot added enhancement New feature or request python Python features java Java features spec Restful openapi spec rust Rust features labels Apr 22, 2025
location:
type: string

Schema:
Copy link
Collaborator Author

@jackye1995 jackye1995 Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might be able to use this to be able to use the same library and achieve all we have discussed.

This is basically aligned with the model described in the Lance format.

To use it properly, at client side (e.g. Spark, a conversion is needed to convert Spark StructType schema to a schema of this shape), and then at server side, a conversion is needed to convert this schema to Arrow schema shape. (this can be provided by this repository as e.g. lance-catalog-apache-client-utils)

Then there is enough information to create the dataset using something like

import com.lancedb.lance.catalog.client.apache.utils.ArrowUtils
Dataset.create(allocator, location,
            ArrowUtils.toArrowSchema(restSchema), params)
        .close();

Thoughts? @yanghua

Copy link
Collaborator

@yanghua yanghua Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean using a string to represent the left type, right? Then the adapter parses and maps it to the arrow type?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

additionalProperties:
type: string

WriterVersion:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically the guardrail that ensures that the server knows if there is any issue in creating a table with the user's environment. For example if the user library is too out of date, the server should reject the table creation request rather than creating a table that the client cannot consume.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the SDK version?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more of the format version https://lancedb.github.io/lance/format.html#file-version

@jackye1995 jackye1995 changed the title feat: create table spec: create table Apr 22, 2025
@jackye1995 jackye1995 requested a review from yanghua April 22, 2025 23:31
$ref: '#/components/schemas/Schema'
writerVersion:
$ref: '#/components/schemas/WriterVersion'
config:
Copy link
Collaborator Author

@jackye1995 jackye1995 Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might not be able to pass anything here yet with the current shape of the Dataset API. But I remember we still had discussion about the storage options last time. This is different from that as these will be persisted in the Lance table metadata.

Do you still think it is necessary to pass storage options in as a part of the request? @yanghua My understanding is that the server should decide the storage options used based on the location. Could you provide any specific use case for the storage option that should be passed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the server should decide the storage options used based on the location.

Our commercial version also uses mode.

My original thought is that maybe we can open an entry point for users, they can pass ak/sk or not, that depends themself.

OK, now let's ignore it now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passing sk is definitely a no go I think, because the server can easily just keep the sk to use outside the duration of the request and it is a huge security red flag. We probably need to solve that when implementing AuthN for the client and server.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passing sk is definitely a no go I think, because the server can easily just keep the sk to use outside the duration of the request and it is a huge security red flag

In our cases, that depends on how we define the client. A client may always be on a cloud vendor's intranet.
Or, in some cases, the customer may give us these informations to handle emergency situations.

Keeping it doesn't necessarily mean using it or encouraging it.

Anyway, let's ignore it. Currently, close this entry point to pass storage options.

@jackye1995 jackye1995 closed this by deleting the head repository Apr 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request java Java features python Python features rust Rust features spec Restful openapi spec

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants