Fix N+1 query pattern in task instance states and count endpoints#60352
Fix N+1 query pattern in task instance states and count endpoints#60352jscheffl merged 2 commits intoapache:mainfrom
Conversation
|
This looks good at first glance. But as the filter based on map index is a new addition (and likely not to be covered by existing tests), I think it would be worth adding a small test to lock in the expected behavior and guard against future regressions. In particular, it would be a good idea to cover:
For (1), you could test both code paths by passing and omitting |
|
Thanks for the review, the scenarios you mentioned are already covered in airflow-core/tests/unit/api_fastapi/execution_api/versions/head/test_task_instances.py:
The existing parametrized cases cover mapped tasks with multiple map indices, default behavior without map_index, and filtering with task_group_id. These tests will now exercise the new db-level filtering path. I did end up adding one missed test combination, filtering by map_index without providing task_group_id |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
|
@SameerMesiah97 it has been some time but tagging for an approval, thanks! |
jscheffl
left a comment
There was a problem hiding this comment.
Looks good to me! LGTM.
As fixing performance I'd suggest to add this to 3.2.1 scope
0b88dd3 to
7338f8e
Compare
|
Yandex-errors are also on main, is unrelated, merging therefore. |
|
Hi maintainer, this PR was merged without a milestone set.
|
Backport successfully created: v3-2-testNote: As of Merging PRs targeted for Airflow 3.X In matter of doubt please ask in #release-management Slack channel.
|
…ndpoints (apache#60352) * fix inefficient fetch all and filter * add unittest case: map-index but no task-group (cherry picked from commit f5ccebc) Co-authored-by: Steve Ahn <steveahnahn@g.ucla.edu>
Problem
The previous implementation fetched all task instances from the database and then filtered by
map_indexin Python. For DAGs with mapped tasks containing large map indices, this caused unnecessary database load and memory usage.Solution
Push the map_index filter to the SQL query, allowing the database to handle filtering efficiently:
map_indexfiltering from Python to SQL inget_task_instance_statesandget_task_instance_countendpointsmap_indexparameter to_get_group_taskshelper function to filter at the database level