Skip to content

Conversation

@bountx
Copy link
Collaborator

@bountx bountx commented Jan 8, 2026

Description:
When ENABLE_DATETIME_INDEX_FILTERING=true, items with different
datetime values are stored in different datetime-partitioned indexes. The bulk insert methods (bulk_async_prep_create_item, bulk_sync_prep_create_item) were using client.exists(index=alias) directly, which could fail to detect existing items stored in other datetime indexes.

Scenario that caused duplicates:

  • Item product-123 with datetime: 2024-01-15 is inserted into stac_items_collection_2024-01
  • POST same product-123 with datetime: 2024-06-20 - alias check misses it
  • Item is inserted into stac_items_collection_2024-06 - duplicate created

Solution
Extracted the duplicate check logic from async_prep_create_item into reusable helper methods that properly iterate through all indexes in the collection alias:

  • _check_item_exists_in_collection (async)
  • _check_item_exists_in_collection_sync (sync)

Updated all three prep methods to use these helpers:

  • async_prep_create_item
  • bulk_async_prep_create_item
  • bulk_sync_prep_create_item

PR Checklist:

  • Code is formatted and linted (run pre-commit run --all-files)
  • Tests pass (run make test)
  • Documentation has been updated to reflect changes, if applicable
  • Changes are added to the changelog

@bountx bountx marked this pull request as ready for review January 8, 2026 15:12
@bountx bountx requested review from Gomez324, YuriZmytrakov and jonhealy1 and removed request for Gomez324 January 8, 2026 15:12
@bountx bountx self-assigned this Jan 8, 2026
Copy link
Collaborator

@jonhealy1 jonhealy1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a couple of things

@bountx bountx requested a review from jonhealy1 January 12, 2026 11:17
jonhealy1
jonhealy1 previously approved these changes Jan 14, 2026
Copy link
Collaborator

@jonhealy1 jonhealy1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bountx Thanks looks great

alias_exists = await self.client.indices.exists_alias(name=alias)

if alias_exists:
return await check_item_exists_in_alias(self.client, alias, doc_id)
Copy link
Collaborator

@YuriZmytrakov YuriZmytrakov Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional, but would be great. I can see that its raising ConflictError, please consider adding a custom error that would have a relevant message error that this item already exists to improve the error quality.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 1051 to 1053
raise ConflictError(
f"Item {item['id']} in collection {item['collection']} already exists"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider improving the error to return not only message, but more information such as which item ids are duplicated and in which conllection. Then you can assert easier the error in tests to check if the conflicted item id == the expected item id that is a duplicate:

raise ConflictError(
f"Item {item['id']} in collection {item['collection']} already exists",
item_id=item["id"],
collection=item["collection"],
)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 1055 to 1057
raise ConflictError(
f"Item {item['id']} in collection {item['collection']} already exists"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the custom error could be applied here as well as in other ConflictError cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created extension of ConflicError class - ItemAlreadyExistsError

super().__init__(message)


async def check_item_exists_in_alias(client: Any, alias: str, doc_id: str) -> bool:
Copy link
Collaborator

@YuriZmytrakov YuriZmytrakov Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional, please, consider using stricter type hints:
client: Union[AsyncElasticsearch, AsyncOpenSearch]

return bool(resp["hits"]["total"]["value"])


def check_item_exists_in_alias_sync(client: Any, alias: str, doc_id: str) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here.

YuriZmytrakov
YuriZmytrakov previously approved these changes Jan 15, 2026
jonhealy1
jonhealy1 previously approved these changes Jan 15, 2026
@bountx bountx dismissed stale reviews from jonhealy1 and YuriZmytrakov via 360aa42 January 27, 2026 08:58
@bountx
Copy link
Collaborator Author

bountx commented Jan 27, 2026

@jonhealy1 I added small fixes that popped up during dev tests and they pass now

@bountx bountx requested a review from jonhealy1 January 27, 2026 12:12
Copy link
Collaborator

@jonhealy1 jonhealy1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good nice work

@bountx
Copy link
Collaborator Author

bountx commented Jan 29, 2026

@jonhealy1 ready to be merged!

@jonhealy1 jonhealy1 merged commit c81fa6b into main Jan 29, 2026
8 checks passed
@jonhealy1 jonhealy1 deleted the CAT-1643 branch January 29, 2026 10:59
@bountx bountx restored the CAT-1643 branch January 29, 2026 12:07
@bountx bountx deleted the CAT-1643 branch January 29, 2026 12:10
@bountx bountx restored the CAT-1643 branch January 29, 2026 13:28
@bountx bountx deleted the CAT-1643 branch January 29, 2026 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants