Skip to content

[CDF-26783] Reintroduce and improve dependency-ordered sorting for views#2614

Merged
Magssch merged 24 commits intomainfrom
reintroduce-topo-sort-for-views
Mar 4, 2026
Merged

[CDF-26783] Reintroduce and improve dependency-ordered sorting for views#2614
Magssch merged 24 commits intomainfrom
reintroduce-topo-sort-for-views

Conversation

@Magssch
Copy link
Contributor

@Magssch Magssch commented Mar 2, 2026

Description

Add dependency-ordered view deployment behind the dependency-ordered-deploy alpha flag. When enabled, views are sorted by their dependency graph (implements, reverse direct relations through and source, edge connection source and edgeSource, and direct relation sources) using Tarjan's SCC algorithm and deployed in topological batch order, avoiding failures caused by interdependent views and reduced API batch limits after the change in #2597. Also extracts the view/container upsert batch limit (50) into a shared constant.

This PR is based on a revert of #2019 with changes to accomodate recent changes in DMS API and fixes a latent bug where TopologicalSorter.static_order() wasn't consumed, preventing cycle detection on Python 3.11+.

If a strongly connected component of size > 50 (dependent on a constant, the limit we set in #2597) then the warning below will be printed now. A deploy will still be attempted, but will likely fail. We believe this is unlikely to occur in practice though, and the assumption is that this indicates a bigger issue with the data model.
image

Bump

  • Patch
  • Skip

Changelog

Added

  • [alpha] Add alpha flag dependency-ordered-deploy to automatically resolve view creation order based on view dependencies. This mitigates typical dependency errors when deploying large and complex schemas.

@gemini-code-assist
Copy link
Contributor

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

@github-actions
Copy link

github-actions bot commented Mar 2, 2026

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
35591 30535 86% 80% 🟢

New Files

File Coverage Status
cognite_toolkit/_cdf_tk/utils/tarjan.py 100% 🟢
TOTAL 100% 🟢

Modified Files

File Coverage Status
cognite_toolkit/_cdf_tk/client/api/containers.py 100% 🟢
cognite_toolkit/_cdf_tk/client/api/views.py 100% 🟢
cognite_toolkit/_cdf_tk/constants.py 100% 🟢
cognite_toolkit/_cdf_tk/cruds/_resource_cruds/datamodel.py 87% 🟢
cognite_toolkit/_cdf_tk/feature_flags.py 100% 🟢
TOTAL 97% 🟢

updated for commit: dc1acde by action🐍

@Magssch
Copy link
Contributor Author

Magssch commented Mar 2, 2026

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request reintroduces topological sorting for view deployment under a feature flag. The implementation uses Tarjan's algorithm to handle dependency cycles and batches views for deployment, which should improve reliability. The code is well-structured, and the addition of a shared constant for batch limits is a good refactoring. I've found a couple of minor issues related to missing type hints in the new tarjan.py utility function, which violates the repository's strong typing principle.

@codecov
Copy link

codecov bot commented Mar 2, 2026

Codecov Report

❌ Patch coverage is 92.70833% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.79%. Comparing base (42b64fe) to head (dc1acde).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...toolkit/_cdf_tk/cruds/_resource_cruds/datamodel.py 88.33% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2614      +/-   ##
==========================================
+ Coverage   85.78%   85.79%   +0.01%     
==========================================
  Files         426      427       +1     
  Lines       35502    35591      +89     
==========================================
+ Hits        30455    30535      +80     
- Misses       5047     5056       +9     
Files with missing lines Coverage Δ
cognite_toolkit/_cdf_tk/client/api/containers.py 100.00% <100.00%> (ø)
cognite_toolkit/_cdf_tk/client/api/views.py 100.00% <100.00%> (ø)
cognite_toolkit/_cdf_tk/constants.py 100.00% <100.00%> (ø)
cognite_toolkit/_cdf_tk/feature_flags.py 100.00% <100.00%> (ø)
cognite_toolkit/_cdf_tk/utils/tarjan.py 100.00% <100.00%> (ø)
...toolkit/_cdf_tk/cruds/_resource_cruds/datamodel.py 86.69% <88.33%> (-0.22%) ⬇️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Magssch Magssch marked this pull request as ready for review March 2, 2026 20:52
@Magssch Magssch requested review from a team as code owners March 2, 2026 20:52
@Magssch Magssch changed the title Reintroduce topo sort for views [CDF-26783] Reintroduce and improve dependency-ordered sorting for views Mar 2, 2026

@staticmethod
def _build_view_implements_dependencies(
self, view_by_ids: dict[ViewReference, ViewResponse], include: set[ViewReference] | None = None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this a static method here is strictly speaking a separate change, but it's non-breaking and required for being able to use it in _sort_implements_or_raise_cycle

parents_by_child = ViewCRUD._build_view_implements_dependencies(view_by_ids)
try:
sorted_views = list(TopologicalSorter(parents_by_child).static_order())
# static_order() returns a lazy generator in Python 3.11+; must be consumed to trigger CycleError
Copy link
Contributor Author

@Magssch Magssch Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list() was always used here, however it was not used in the previous code used for the deployment topological sorting that this PR reverts the removal of, and which this code now supersedes. In the earlier code, the call to static_order() was not wrapped in list() and not assigned to any variable or iterated over, which meant that the topological sorting (which was really only used to check for cycles) never executed due to the lazy-loading which was (relatively) recently introduced in Python. So I think adding a comment about this gotcha could be a good idea

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dependencies_by_id[view_id].update([parent for parent in view.implements or [] if parent in views_by_id])
for view_property in (view.properties or {}).values():
if isinstance(view_property, ReverseDirectRelationProperty):
if view_property.source in views_by_id:
Copy link
Contributor Author

@Magssch Magssch Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is new, not currently validated in DMS but per Starbase they will add a validation on this now

through_source = view_property.through.source
if isinstance(through_source, ViewReference) and through_source in views_by_id:
dependencies_by_id[view_id].add(through_source)
elif isinstance(view_property, EdgeProperty):
Copy link
Contributor Author

@Magssch Magssch Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, check is new, not currently validated in DMS, but per Starbase they will / might add validations on this in the future

dependencies_by_id[view_id].add(view_property.source)
if view_property.edge_source is not None and view_property.edge_source in views_by_id:
dependencies_by_id[view_id].add(view_property.edge_source)
elif isinstance(view_property, ViewCorePropertyRequest) and view_property.source is not None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is also new, it is currently validated in most (but not all) CDF projects in DMS, newly rolled out as of 2 weeks ago.

return super().diff_list(local, cdf, json_path)

def create(self, items: Sequence[ViewRequest]) -> list[ViewResponse]:
if Flags.DEPENDENCY_ORDERED_DEPLOY.is_enabled():
Copy link
Contributor Author

@Magssch Magssch Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider not feature flagging this? An alternative way to mitigate the risk could be to only activate this path if the request is above size 50. it seems to be basically mandatory for us to do a topological sort now given the amount of validations done in DMS, and the current logic is likely to break for anyone deploying more than 50 views now.

Copy link
Contributor

@doctrino doctrino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just split the constant into two constants, otherwise it is good.

@Magssch Magssch requested a review from doctrino March 3, 2026 18:44
@Magssch Magssch enabled auto-merge (squash) March 4, 2026 07:35
@Magssch Magssch merged commit e689bb0 into main Mar 4, 2026
14 checks passed
@Magssch Magssch deleted the reintroduce-topo-sort-for-views branch March 4, 2026 07:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants