[CDF-26783] Reintroduce and improve dependency-ordered sorting for views#2614
[CDF-26783] Reintroduce and improve dependency-ordered sorting for views#2614
Conversation
This reverts commit 7248159.
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. |
☂️ Python Coverage
Overall Coverage
New Files
Modified Files
|
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request reintroduces topological sorting for view deployment under a feature flag. The implementation uses Tarjan's algorithm to handle dependency cycles and batches views for deployment, which should improve reliability. The code is well-structured, and the addition of a shared constant for batch limits is a good refactoring. I've found a couple of minor issues related to missing type hints in the new tarjan.py utility function, which violates the repository's strong typing principle.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2614 +/- ##
==========================================
+ Coverage 85.78% 85.79% +0.01%
==========================================
Files 426 427 +1
Lines 35502 35591 +89
==========================================
+ Hits 30455 30535 +80
- Misses 5047 5056 +9
🚀 New features to boost your workflow:
|
|
|
||
| @staticmethod | ||
| def _build_view_implements_dependencies( | ||
| self, view_by_ids: dict[ViewReference, ViewResponse], include: set[ViewReference] | None = None |
There was a problem hiding this comment.
Making this a static method here is strictly speaking a separate change, but it's non-breaking and required for being able to use it in _sort_implements_or_raise_cycle
| parents_by_child = ViewCRUD._build_view_implements_dependencies(view_by_ids) | ||
| try: | ||
| sorted_views = list(TopologicalSorter(parents_by_child).static_order()) | ||
| # static_order() returns a lazy generator in Python 3.11+; must be consumed to trigger CycleError |
There was a problem hiding this comment.
list() was always used here, however it was not used in the previous code used for the deployment topological sorting that this PR reverts the removal of, and which this code now supersedes. In the earlier code, the call to static_order() was not wrapped in list() and not assigned to any variable or iterated over, which meant that the topological sorting (which was really only used to check for cycles) never executed due to the lazy-loading which was (relatively) recently introduced in Python. So I think adding a comment about this gotcha could be a good idea
There was a problem hiding this comment.
| dependencies_by_id[view_id].update([parent for parent in view.implements or [] if parent in views_by_id]) | ||
| for view_property in (view.properties or {}).values(): | ||
| if isinstance(view_property, ReverseDirectRelationProperty): | ||
| if view_property.source in views_by_id: |
There was a problem hiding this comment.
This check is new, not currently validated in DMS but per Starbase they will add a validation on this now
| through_source = view_property.through.source | ||
| if isinstance(through_source, ViewReference) and through_source in views_by_id: | ||
| dependencies_by_id[view_id].add(through_source) | ||
| elif isinstance(view_property, EdgeProperty): |
There was a problem hiding this comment.
Same as above, check is new, not currently validated in DMS, but per Starbase they will / might add validations on this in the future
| dependencies_by_id[view_id].add(view_property.source) | ||
| if view_property.edge_source is not None and view_property.edge_source in views_by_id: | ||
| dependencies_by_id[view_id].add(view_property.edge_source) | ||
| elif isinstance(view_property, ViewCorePropertyRequest) and view_property.source is not None: |
There was a problem hiding this comment.
This check is also new, it is currently validated in most (but not all) CDF projects in DMS, newly rolled out as of 2 weeks ago.
| return super().diff_list(local, cdf, json_path) | ||
|
|
||
| def create(self, items: Sequence[ViewRequest]) -> list[ViewResponse]: | ||
| if Flags.DEPENDENCY_ORDERED_DEPLOY.is_enabled(): |
There was a problem hiding this comment.
Should we consider not feature flagging this? An alternative way to mitigate the risk could be to only activate this path if the request is above size 50. it seems to be basically mandatory for us to do a topological sort now given the amount of validations done in DMS, and the current logic is likely to break for anyone deploying more than 50 views now.
doctrino
left a comment
There was a problem hiding this comment.
Just split the constant into two constants, otherwise it is good.
Description
Add dependency-ordered view deployment behind the
dependency-ordered-deployalpha flag. When enabled, views are sorted by their dependency graph (implements, reverse direct relations through and source, edge connection source and edgeSource, and direct relation sources) using Tarjan's SCC algorithm and deployed in topological batch order, avoiding failures caused by interdependent views and reduced API batch limits after the change in #2597. Also extracts the view/container upsert batch limit (50) into a shared constant.This PR is based on a revert of #2019 with changes to accomodate recent changes in DMS API and fixes a latent bug where
TopologicalSorter.static_order()wasn't consumed, preventing cycle detection on Python 3.11+.If a strongly connected component of size > 50 (dependent on a constant, the limit we set in #2597) then the warning below will be printed now. A deploy will still be attempted, but will likely fail. We believe this is unlikely to occur in practice though, and the assumption is that this indicates a bigger issue with the data model.

Bump
Changelog
Added
dependency-ordered-deployto automatically resolve view creation order based on view dependencies. This mitigates typical dependency errors when deploying large and complex schemas.