Skip to content

Conversation

@wojiaodoubao
Copy link
Collaborator

This pr tries to add partition as an experimental spec. Discussion could be fount at: #272

This PR remains a draft, and we still need to hold a vote to decide whether to introduce the partition spec.

@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@wojiaodoubao wojiaodoubao marked this pull request as draft December 11, 2025 07:15
@github-actions
Copy link
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@wojiaodoubao wojiaodoubao changed the title doc: add partitioin spec (experimental) doc: add partitioin spec - experimental Dec 12, 2025
@wojiaodoubao wojiaodoubao changed the title doc: add partitioin spec - experimental doc: add partitioin spec Dec 12, 2025
@wojiaodoubao wojiaodoubao marked this pull request as ready for review December 13, 2025 03:22
@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@wojiaodoubao wojiaodoubao changed the title doc: add partitioin spec docs: add partitioin spec Dec 13, 2025
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Dec 13, 2025
@wojiaodoubao
Copy link
Collaborator Author

Hi @jackye1995 , I've updated this pr, could you help review when you have time, thanks very much~

Copy link
Collaborator

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a pass for the spec, I think we can merge this first later after one more round of iteration, and start the implementation in https://github.com/lance-format/lance/tree/main/rust/lance-namespace-impls/src/dir and iterate on that.

@wojiaodoubao
Copy link
Collaborator Author

Hi @jackye1995 , thanks your nice suggestions! I have updated this pr and targeted all the comments. There's one thing I'm not quite sure about.

we should expand this in an independent section to talk about partition transform and partition pruning. This part we should in general just reference how iceberg does it, with field ID, source ID, all those things.

I understand that the current spec is mostly consistent with Iceberg in terms of partition transform, except for the JSON serialization method, for example iceberg uses bucket[N] while lance use bucket and bucket_size=N. Additionally, the current solution is relatively simple and does not define partition field IDs. Is this ok?

@jackye1995
Copy link
Collaborator

@wojiaodoubao I edited the doc with some additional designs, let me know if this looks good to you.

@jackye1995 jackye1995 changed the title docs: add partitioin spec feat: add partitioned namespace spec Jan 3, 2026
@github-actions github-actions bot added the enhancement New feature or request label Jan 3, 2026
@jackye1995
Copy link
Collaborator

Try to explain my thinking here:

  1. versioned partition spec: partition evolution is a requirement to be comparable against similar projects, so we do need to version the partition spec

  2. source field ID: lance table schema fields do have field ID, it's recorded in field metadata, so we should use that instead of field name to be schema evolution-friendly

  3. partition field ID: partition field ID serves as the priority of partition keys, so we can still know the order of the fields even if it is converted to a map or something.

  4. schema: the current approach is not fully correct yet, I still want to think a bit more on it. the schema id is used in spec as source id, so we need to know all fields that are still referenced in the partition specs, even if it is already dropped. Currently I just say we keep the unioned schema, but that might be not enough because the field ID could differ in namespace schema and underlying table schema. Maybe the easiest way is to just say in the spec that the namespace and table level schema should remain consistent.

  5. partition expression: I switched to use just DataFusion expression to describe the partition transform, I feel this is more generic and also makes it more consistent with over Lance philosophy that we try to use Arrow & DataFusion semantics, rather than invent our own semantics for query related stuffs.

  6. partition namespace name: if we use col=val style, if value has $ character it would not work. So we just use a random name and the partition key and value can be found in namespace properties (generated dynamically from __manifest column data). This also allow us to avoid leaking data information in the namespace name for users that want high security measures.

  7. partition pruning: to make pruning work more efficiently, we directly store partition info as columns, and pruning is just a SQL call to select the table matching the predicate.

  8. transaction support: I added the additional read_version column in __manifest so that if user desire multi-partition transaction, __manifest can be used as the main medium to coordinate commits and enforce read isolation. More details are probably needed in this section to describe the read and write path in more details.

The partitioning information is stored in `partition_spec_v<N>` (e.g., `partition_spec_v1`), which is a JSON array of partition field objects. Each partition field contains:

* A **field id** uniquely identifying this partition field
* A **name** for the partition field (used as the column name in `__manifest`)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im' not sure if I understand correctly.

I think name need to be unique because it is a column in __manifest. The spec supports partition evolution, it means we need to prevent user from adding a partition field with existed name when 'adding partition column'.

partition field ID: partition field ID serves as the priority of partition keys, so we can still know the order of the fields even if it is converted to a map or something.

The Field ID represents the sequence. So if I have partition fields country, city, and I want to update it to continent , country, city, what should I do?

  • First I need to resolve the field id issue: the filed id of country must be less than continent, so is city. I can do it by adding 3 new partition fields.
  • Second I'll face name conflicts. Can I reuse country and city as partition field name?

Shall we use {partition_field_id}_{partition_field_name} as the column name in __manifest table?

Copy link
Collaborator

@jackye1995 jackye1995 Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Field ID represents the sequence. So if I have partition fields country, city, and I want to update it to continent , country, city, what should I do?

My thinking is that the sequence only matters because we need to know how to organize the namespace levels (as described in #279 (comment))

so if you add a continent, it will be country -> city -> continent. Technically for use cases like query pruning, partitioned join, I don't think this matters because you can still find "all partitions belong to continent Asia" without problem even though it is the last level not the first.

First I need to resolve the field id issue: the filed id of country must be less than continent, so is city. I can do it by adding 3 new partition fields.

The partition specs are not independent, it should have some relationship between versions. In the example you give, we are adding a new partition field, so country and city field IDs should remain unchanged, and continent has the next field ID.

Second I'll face name conflicts. Can I reuse country and city as partition field name?

that's a good point. I think the safest approach is likely to also use only id to reference the partition fields, so the column names would take partition_field_{i} in the __manifest table.


1. Query engine analyzes the query predicate to identify filters on partition columns
2. For each partition expression, the engine evaluates the expression with the query values to compute the expected partition value(s)
3. Engine queries `__manifest` with filters on the partition columns
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will DirectoryNamespace providing a method to query __manifest table? Or a method to push down filter and get tables identifiers?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wojiaodoubao
Copy link
Collaborator Author

versioned partition spec: partition evolution is a requirement to be comparable against similar projects, so we do need to version the partition spec

+1

source field ID: lance table schema fields do have field ID, it's recorded in field metadata, so we should use that instead of field name to be schema evolution-friendly

Does it mean all tables must be consistent at filed ID?

partition field ID: partition field ID serves as the priority of partition keys, so we can still know the order of the fields even if it is converted to a map or something.

+1

schema: the current approach is not fully correct yet, I still want to think a bit more on it. the schema id is used in spec as source id, so we need to know all fields that are still referenced in the partition specs, even if it is already dropped. Currently I just say we keep the unioned schema, but that might be not enough because the field ID could differ in namespace schema and underlying table schema. Maybe the easiest way is to just say in the spec that the namespace and table level schema should remain consistent.

+1 say in the spec that the namespace and table level schema should remain consistent.

partition expression: I switched to use just DataFusion expression to describe the partition transform, I feel this is more generic and also makes it more consistent with over Lance philosophy that we try to use Arrow & DataFusion semantics, rather than invent our own semantics for query related stuffs.

+1. For engines that support DataFusion, we can directly use DataFusion to compute partition values; for those that do not, we need to implement functions with the same semantics ourselves, right?

partition namespace name: if we use col=val style, if value has $ character it would not work. So we just use a random name and the partition key and value can be found in namespace properties (generated dynamically from __manifest column data). This also allow us to avoid leaking data information in the namespace name for users that want high security measures.

+1

partition pruning: to make pruning work more efficiently, we directly store partition info as columns, and pruning is just a SQL call to select the table matching the predicate.

+1. I was thinking, how should we expose it to engine? It seems a bad idea to expose __manifest table directly. So maybe adding a new function to DirectoryNamespace(not Namespace) to perform partition pruning?

transaction support: I added the additional read_version column in __manifest so that if user desire multi-partition transaction, __manifest can be used as the main medium to coordinate commits and enforce read isolation. More details are probably needed in this section to describe the read and write path in more details.

Current spec supports schema evolution, partition evolution, and ACID. Does this mean it adds an additional table format layer on top of the Lance Table Format? I wonder if we need to clarify which table format capabilities can be implemented at the partition layer and which will never be implemented there.

@jackye1995
Copy link
Collaborator

Does this mean it adds an additional table format layer on top of the Lance Table Format? I wonder if we need to clarify which table format capabilities can be implemented at the partition layer and which will never be implemented there.

What specific capabilities are you thinking about? In my mind the cut is clear: Lance table format will never offer partitioning, and only do clustering. This has been clear in the previous conversations. Any feature related to hard-partitioning a logical dataset into physical parts go in this layer.

Note that we are only talking about the table data. We could still do partitioning for a specific index, if that makes sense for that index type.

@wojiaodoubao
Copy link
Collaborator Author

wojiaodoubao commented Jan 4, 2026

What specific capabilities are you thinking about?

I was thinking capabilities like: branch, tag, time travel, deletion a physical partition(table), compaction, cleanup etc for partitioned namespace.

I like current capabilities set: schema evolution, partition evolution, and read_version. I think it might be too much if we add the capabilities above to partitioned namespace.

Any feature related to hard-partitioning a logical dataset into physical parts go in this layer.

+1, agree.


**Field ID Stability**: Field IDs (`lance:field_id`) are never reused. Once a field ID is assigned, it permanently identifies that logical column even if the column is later deprecated. This ensures partition specs using `source_id` references remain valid.

**Partition Field Validity**: If a source column is deprecated, existing partition fields referencing it via `source_id` remain valid for reading existing data. However, new partition spec versions should not reference deprecated columns. To remove a partition field, create a new partition spec version without that field.
Copy link
Collaborator Author

@wojiaodoubao wojiaodoubao Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

existing partition fields referencing it via source_id remain valid for reading existing data

In partitioned namespace we won't support time travel. After a filed is removed from schema, it seems we don't need the removed field when reading existing data.

  1. Engine analyzes query predicate and get partition expressions. The removed field doesn't exist in any predicate(otherwise it breaks semantic check since the removed field is not in schema), so it doesn't exist in any partition expression.
  2. Engine evaluates the expression, compute the expected partition value(s), queries __manifest with filters, retrieves the paths of matching dataset tables, finally scan them.

So I think we don't need to mark source column as deprecated, instead maybe just remove it. We can also drop the partition_field column since there is noway to reference the partition field.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I think you are right

@jackye1995
Copy link
Collaborator

jackye1995 commented Jan 4, 2026

For engines that support DataFusion, we can directly use DataFusion to compute partition values; for those that do not, we need to implement functions with the same semantics ourselves, right?

Yes. I'm not saying we are focusing engines to use DataFusion. But by stating the expression in it, we do not need to explain the behaviors of the expressions anymore, all the behaviors to all data types, null, computation behaviors are already defined. Any engine can make sure the behaviors match exactly to the DataFusion behavior when evaluating the expression.

With that being said, I think most likely we will add some rust implementation to return a list of datasets to scan for a query, and then bind that to other languages, so most engines will eventually call into DataFusion to do the pruning.

how should we expose it to engine? It seems a bad idea to expose __manifest table directly. So maybe adding a new function to DirectoryNamespace(not Namespace) to perform partition pruning?

This goes to what I was talking about last paragraph. I think we will have a PartitionedNamespace which exposed a plan_scan step to return qualifying dataset and any residual filter. That API can be used by engines to do first level planning. For full DataFusion based execution, we can form the whole execution plan to scan all datasets and do additional reranking if necessary.

@jackye1995
Copy link
Collaborator

jackye1995 commented Jan 4, 2026

I was thinking capabilities like: branch, tag, time travel, deletion a physical partition(table), compaction, cleanup etc for partitioned namespace

I think there are 2 categories:

  1. features in the table format: by the nature of using a __manifest Lance table, implementations of the partitioned namespace spec is able to do branching, time travel, deletion vector, etc. against contents in __manifest. I don't think we need to strictly enumerate all those features or ban any features, since the spec is mainly describing the storage layout and not all features it can or cannot do.
  2. features offered in rust SDK and engine implementations: for example compaction, cleanup, etc. I think we will just start from the initial set of features like basic read and write support, and see whatever new features we need step by step. We might want to do some planning together. Although I put things like partition evolution in the spec, that's mainly to ensure the spec is future proof and we don't need to immediately do a v2 in short term. We don't have to implement it right away if features like compaction across partitions is more important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants