-
Notifications
You must be signed in to change notification settings - Fork 26
spec: create table #55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| location: | ||
| type: string | ||
|
|
||
| Schema: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might be able to use this to be able to use the same library and achieve all we have discussed.
This is basically aligned with the model described in the Lance format.
To use it properly, at client side (e.g. Spark, a conversion is needed to convert Spark StructType schema to a schema of this shape), and then at server side, a conversion is needed to convert this schema to Arrow schema shape. (this can be provided by this repository as e.g. lance-catalog-apache-client-utils)
Then there is enough information to create the dataset using something like
import com.lancedb.lance.catalog.client.apache.utils.ArrowUtils
Dataset.create(allocator, location,
ArrowUtils.toArrowSchema(restSchema), params)
.close();
Thoughts? @yanghua
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean using a string to represent the left type, right? Then the adapter parses and maps it to the arrow type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
| additionalProperties: | ||
| type: string | ||
|
|
||
| WriterVersion: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is basically the guardrail that ensures that the server knows if there is any issue in creating a table with the user's environment. For example if the user library is too out of date, the server should reject the table creation request rather than creating a table that the client cannot consume.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the SDK version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's more of the format version https://lancedb.github.io/lance/format.html#file-version
| $ref: '#/components/schemas/Schema' | ||
| writerVersion: | ||
| $ref: '#/components/schemas/WriterVersion' | ||
| config: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might not be able to pass anything here yet with the current shape of the Dataset API. But I remember we still had discussion about the storage options last time. This is different from that as these will be persisted in the Lance table metadata.
Do you still think it is necessary to pass storage options in as a part of the request? @yanghua My understanding is that the server should decide the storage options used based on the location. Could you provide any specific use case for the storage option that should be passed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that the server should decide the storage options used based on the location.
Our commercial version also uses mode.
My original thought is that maybe we can open an entry point for users, they can pass ak/sk or not, that depends themself.
OK, now let's ignore it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
passing sk is definitely a no go I think, because the server can easily just keep the sk to use outside the duration of the request and it is a huge security red flag. We probably need to solve that when implementing AuthN for the client and server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
passing sk is definitely a no go I think, because the server can easily just keep the sk to use outside the duration of the request and it is a huge security red flag
In our cases, that depends on how we define the client. A client may always be on a cloud vendor's intranet.
Or, in some cases, the customer may give us these informations to handle emergency situations.
Keeping it doesn't necessarily mean using it or encouraging it.
Anyway, let's ignore it. Currently, close this entry point to pass storage options.
Based on discussion on #5