-
Notifications
You must be signed in to change notification settings - Fork 17
Feature/2532 aggregations demormalizations #2601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feature/2532 aggregations demormalizations #2601
Conversation
…s experiment_session, and add new one using the message send with the experiment version to define experiment_versions
# Conflicts: # apps/participants/views.py
📝 WalkthroughWalkthroughThis PR consolidates experiment session activity tracking by removing the Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks❌ Failed checks (1 warning, 2 inconclusive)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
apps/experiments/models.py (1)
1699-1701: Backward-compatible stub forwith_last_message_created_at.The method now returns the plain queryset since
last_activity_atis a denormalized model field. This maintains API compatibility for existing callers while the annotation is no longer needed.Consider adding a deprecation notice or renaming the method in a future cleanup, as the name no longer accurately describes its behavior.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
apps/analysis/views.py(1 hunks)apps/channels/tests/test_base_channel_behavior.py(1 hunks)apps/chatbots/tables.py(1 hunks)apps/chatbots/views.py(2 hunks)apps/dashboard/services.py(4 hunks)apps/dashboard/tests/test_distinct_usage.py(1 hunks)apps/evaluations/tables.py(1 hunks)apps/experiments/filters.py(4 hunks)apps/experiments/models.py(3 hunks)apps/experiments/tables.py(1 hunks)apps/experiments/tests/test_filters.py(3 hunks)apps/experiments/tests/test_models.py(0 hunks)apps/experiments/views/experiment.py(0 hunks)apps/participants/views.py(1 hunks)
💤 Files with no reviewable changes (2)
- apps/experiments/tests/test_models.py
- apps/experiments/views/experiment.py
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use Ruff for Python linting and formatting with line length of 120 characters
Files:
apps/experiments/tests/test_filters.pyapps/evaluations/tables.pyapps/dashboard/tests/test_distinct_usage.pyapps/participants/views.pyapps/experiments/tables.pyapps/channels/tests/test_base_channel_behavior.pyapps/dashboard/services.pyapps/chatbots/views.pyapps/chatbots/tables.pyapps/analysis/views.pyapps/experiments/models.pyapps/experiments/filters.py
⚙️ CodeRabbit configuration file
**/*.py: Do not review for Python compatibility below 3.13
Flag heavy AI/ML imports at module level (langchain_*, openai, anthropic, google, boto3, pandas, numpy). These should be lazy-loaded inside methods to keep Django startup fast.
Files:
apps/experiments/tests/test_filters.pyapps/evaluations/tables.pyapps/dashboard/tests/test_distinct_usage.pyapps/participants/views.pyapps/experiments/tables.pyapps/channels/tests/test_base_channel_behavior.pyapps/dashboard/services.pyapps/chatbots/views.pyapps/chatbots/tables.pyapps/analysis/views.pyapps/experiments/models.pyapps/experiments/filters.py
**/{apps,config}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/{apps,config}/**/*.py: Use lazy imports for heavy AI/ML libraries (langchain_google_vertexai, langchain_google_genai, etc.) to maintain fast Django startup time
UseTYPE_CHECKINGimports for type hints when lazy-loading heavy libraries
Files:
apps/experiments/tests/test_filters.pyapps/evaluations/tables.pyapps/dashboard/tests/test_distinct_usage.pyapps/participants/views.pyapps/experiments/tables.pyapps/channels/tests/test_base_channel_behavior.pyapps/dashboard/services.pyapps/chatbots/views.pyapps/chatbots/tables.pyapps/analysis/views.pyapps/experiments/models.pyapps/experiments/filters.py
**/tests/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/tests/*.py: Organize tests intests/directories with separate files fortest_models.py,test_views.py, andtest_api.py
Check for existing fixtures inconftest.pyfiles and use existing factories inapps/utils/factories/before creating new test data generation
Mock external services (LLM providers, etc.) in tests rather than making live API calls
Files:
apps/experiments/tests/test_filters.pyapps/dashboard/tests/test_distinct_usage.pyapps/channels/tests/test_base_channel_behavior.py
**/{apps,config}/**/views.py
📄 CodeRabbit inference engine (AGENTS.md)
**/{apps,config}/**/views.py: Use@login_and_team_requireddecorator for function-based views andLoginAndTeamRequiredMixinfor class-based views
Filter API queryset responses by team usingMyModel.objects.filter(team=self.request.team)
Use Django class-based views with mixins instead of function-based views
Files:
apps/participants/views.pyapps/chatbots/views.pyapps/analysis/views.py
**/{apps,config}/**/models.py
📄 CodeRabbit inference engine (AGENTS.md)
**/{apps,config}/**/models.py: Inherit team-scoped models fromBaseTeamModelwhich automatically includes team FK and audit timestamps
UseVersionsMixinandVersionsObjectManagerMixinfor models requiring version tracking
Use@audit_fieldsdecorator withAuditingManagerfor tracking field changes on important models
Usepublic_id = models.UUIDField(default=uuid.uuid4, unique=True)for external API identifiers
UseBaseModelfor non-team-scoped models when team-scoping is not required
All data must be scoped to teams viaBaseTeamModelwith no cross-team data access without explicit permission
Files:
apps/experiments/models.py
🧠 Learnings (8)
📚 Learning: 2024-10-09T07:32:03.113Z
Learnt from: SmittieC
Repo: dimagi/open-chat-studio PR: 698
File: apps/experiments/views/experiment.py:0-0
Timestamp: 2024-10-09T07:32:03.113Z
Learning: The functions `experiment_chat_session` and `experiment_session_message` intentionally use `version` as the parameter name, and it's acceptable in this codebase.
Applied to files:
apps/experiments/tests/test_filters.pyapps/channels/tests/test_base_channel_behavior.pyapps/experiments/filters.py
📚 Learning: 2024-12-04T07:25:35.489Z
Learnt from: SmittieC
Repo: dimagi/open-chat-studio PR: 950
File: apps/service_providers/tests/test_runnables_cancellation.py:91-98
Timestamp: 2024-12-04T07:25:35.489Z
Learning: In `apps/service_providers/llm_service/history_managers.py`, the `ExperimentHistoryManager.for_llm_chat` method intentionally accepts an `experiment` parameter separate from `session.experiment` because the experiment passed to the history manager may not always be the same as `session.experiment`.
Applied to files:
apps/experiments/tests/test_filters.pyapps/participants/views.py
📚 Learning: 2024-11-18T14:52:42.041Z
Learnt from: SmittieC
Repo: dimagi/open-chat-studio PR: 876
File: apps/service_providers/llm_service/state.py:181-187
Timestamp: 2024-11-18T14:52:42.041Z
Learning: In `apps/service_providers/llm_service/state.py`, the `ChatExperimentState` class is intentionally different from `ExperimentAssistantState`, and their method signatures for `save_message_to_history` may differ.
Applied to files:
apps/experiments/tests/test_filters.py
📚 Learning: 2025-01-13T09:59:19.577Z
Learnt from: SmittieC
Repo: dimagi/open-chat-studio PR: 1035
File: apps/channels/models.py:97-98
Timestamp: 2025-01-13T09:59:19.577Z
Learning: The test `test_all_channels_can_be_instantiated_from_a_session` in `apps/channels/tests/test_base_channel_behavior.py` provides basic coverage for all channel platforms, including `CONNECT_MESSAGING`, through parameterization with `ChannelPlatform.choices`.
Applied to files:
apps/dashboard/tests/test_distinct_usage.pyapps/channels/tests/test_base_channel_behavior.py
📚 Learning: 2024-10-18T08:41:26.808Z
Learnt from: snopoke
Repo: dimagi/open-chat-studio PR: 740
File: apps/pipelines/models.py:252-252
Timestamp: 2024-10-18T08:41:26.808Z
Learning: In the `as_langchain_messages` method of `PipelineChatMessages`, the messages are returned in the reverse order `[AIMessage, HumanMessage, SystemMessage]` due to where this method is called. The `SystemMessage` represents the conversation summary and is only included if it exists.
Applied to files:
apps/dashboard/services.py
📚 Learning: 2025-12-19T12:29:01.450Z
Learnt from: CR
Repo: dimagi/open-chat-studio PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-19T12:29:01.450Z
Learning: Applies to **/{apps,config}/**/models.py : Use `audit_fields` decorator with `AuditingManager` for tracking field changes on important models
Applied to files:
apps/experiments/models.py
📚 Learning: 2024-10-08T09:45:11.463Z
Learnt from: snopoke
Repo: dimagi/open-chat-studio PR: 698
File: apps/experiments/models.py:598-603
Timestamp: 2024-10-08T09:45:11.463Z
Learning: In the `get_version` method of the `Experiment` class in `apps/experiments/models.py`, minor improvements to error handling that only enhance error messages without adding significant value may not be necessary.
Applied to files:
apps/experiments/filters.py
📚 Learning: 2024-10-09T06:08:24.352Z
Learnt from: SmittieC
Repo: dimagi/open-chat-studio PR: 698
File: templates/experiments/single_experiment_home.html:255-255
Timestamp: 2024-10-09T06:08:24.352Z
Learning: In the `get_experiment_version_number` method, if a chat doesn't have a version stored in its metadata, it defaults to "default", and "Default Version" is displayed.
Applied to files:
apps/experiments/filters.py
🧬 Code graph analysis (6)
apps/dashboard/tests/test_distinct_usage.py (1)
apps/events/tests/test_timeout_trigger.py (1)
channel(33-34)
apps/channels/tests/test_base_channel_behavior.py (2)
apps/utils/factories/experiment.py (1)
ExperimentFactory(55-75)apps/chat/channels.py (5)
message(185-186)message(193-198)participant_id(147-150)experiment_session(176-177)experiment_session(180-182)
apps/chatbots/views.py (1)
apps/experiments/models.py (2)
ExperimentSession(1704-1970)annotate_with_message_count(1670-1677)
apps/chatbots/tables.py (1)
apps/generics/tables.py (1)
TimeAgoColumn(37-49)
apps/experiments/models.py (2)
apps/pipelines/models.py (1)
get_queryset(78-89)apps/experiments/versioning.py (1)
get_queryset(406-424)
apps/experiments/filters.py (3)
apps/web/dynamic_filters/base.py (4)
apply_any_of(159-160)apply_any_of(214-224)apply_excludes(167-168)apply_all_of(162-165)apps/web/dynamic_filters/column_filters.py (3)
apply_any_of(43-44)apply_excludes(51-52)apply_all_of(46-49)apps/experiments/models.py (1)
annotate_with_first_message_created_at(1663-1668)
🪛 Ruff (0.14.8)
apps/experiments/filters.py
114-114: Unused method argument: timezone
(ARG002)
119-119: Unused method argument: timezone
(ARG002)
123-123: Unused method argument: timezone
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Seer Code Review
- GitHub Check: cypress-tests
- GitHub Check: python-tests
🔇 Additional comments (18)
apps/analysis/views.py (1)
102-105: LGTM!The removal of
.annotate_with_last_message_created_at()is correct sincelast_activity_atis now a denormalized field onExperimentSession. TheExperimentSessionsTableaccesses this field directly via thelast_activity_ataccessor.apps/dashboard/tests/test_distinct_usage.py (1)
286-292: LGTM!The addition of
platform=channel.platformcorrectly aligns test data creation with the new schema whereExperimentSessionhas a top-levelplatformfield for filtering, rather than relying on the relatedexperiment_channel__platformlookup.apps/experiments/filters.py (3)
146-146: Efficient direct field access for platform filtering.Changing from
experiment_channel__platformtoplatformeliminates a JOIN, improving query performance. This aligns with the denormalization of theplatformfield ontoExperimentSession.
164-179: Filter configuration updated for denormalized fields.The changes correctly:
- Update the "Last Message" timestamp column from
last_message_created_attolast_activity_at- Simplify
prepare_querysetto only annotatefirst_message_created_atsincelast_activity_atis now a model field
110-125: Version number parsing lacks input validation.The
_get_version_numbersmethod assumes all version names follow thev<number>format. If malformed input is passed (e.g.,"version1"or"v"), this will raise aValueErroronint()conversion. WhileExperiment.objects.get_version_names()always producesv<n>format and filter values typically come from UI-controlled selections of these options, there is no server-side validation to enforce this constraint. Defensive parsing could improve robustness against unexpected input.apps/chatbots/tables.py (1)
94-94: LGTM!The accessor correctly updated to use the denormalized
last_activity_atfield. TheTimeAgoColumnwill properly render thisDateTimeFieldvalue.Note: The
verbose_namechanged from "Last Message" to "Last activity" - this is a minor terminology shift that aligns with the semantic change (the field now tracks any user interaction, not just the last message timestamp).apps/experiments/models.py (1)
1679-1692: PostgreSQL-specific RawSQL for versions list annotation is correctly implemented.The RawSQL expression using
array_to_stringandunnestare PostgreSQL functions that handle array transformations. This is appropriate for Django projects exclusively using PostgreSQL.The implementation correctly:
- Handles NULL
experiment_versionsviaCoalesce- Uses proper parameterization with an empty params list
[](no SQL injection risk)- Orders version numbers numerically within the array transformation
- Produces a display-friendly format like "v1, v2, v3"
apps/channels/tests/test_base_channel_behavior.py (1)
91-110: LGTM - Good test coverage for version tracking.This test effectively verifies that the
experiment_versionsfield onExperimentSessionis correctly updated as messages are sent across different experiment versions. The test clearly demonstrates:
- Initial version (v1) results in
[1]- Subsequent version (v2) results in accumulated
[1, 2]apps/experiments/tables.py (1)
117-117: LGTM - Column accessor updated for denormalized field.The accessor change from
last_message_created_attolast_activity_atcorrectly reflects the new denormalized field. The column remains orderable, confirming thatlast_activity_atis a database field.apps/experiments/tests/test_filters.py (2)
86-87: LGTM - Explicit test data setup improves clarity.Setting
experiment_versionsexplicitly in test fixtures is good practice. It makes the tests more deterministic and less dependent on implicit side effects from message creation or tagging logic.Also applies to: 93-94
123-125: LGTM - Explicit timestamp setup for test determinism.Explicitly setting
last_activity_atin the test setup ensures the tests are deterministic and don't rely on automatic timestamp updates that might occur during message creation. This aligns well with the denormalization changes.Also applies to: 137-139
apps/evaluations/tables.py (1)
261-261: LGTM - Consistent accessor update.The accessor change mirrors the update in
apps/experiments/tables.py, maintaining consistency across the codebase for displaying session last activity.apps/chatbots/views.py (2)
494-494: LGTM - Annotation removal aligns with denormalization.Removing
annotate_with_last_message_created_at()is correct since the table now uses the denormalizedlast_activity_atfield directly.
224-229: Verify the behavioral change from Trace-based to session-based activity tracking.The subquery now uses
ExperimentSession.last_activity_atwhich is updated in real-time during chat interactions (seeapps/chat/channels.py:882) rather than deriving activity fromTracetimestamps. This simplifies the query, avoids expensive trace joins, and relies on a dedicated timestamp field. Confirm that this provides the correct semantics for "last activity" in the chatbot experiment table.last_activity_subquery = ( ExperimentSession.objects.filter(experiment_id=OuterRef("pk")) .exclude(experiment_channel__platform=ChannelPlatform.EVALUATIONS) .order_by("-last_activity_at") .values("last_activity_at")[:1] )The call to
annotate_with_message_count()at line 494 is appropriate for message counting and is unrelated to this activity tracking change.apps/dashboard/services.py (3)
429-429: LGTM - Last activity now uses denormalized field.The change from computing last activity from message timestamps to using the denormalized
experimentsession__last_activity_atfield should improve query performance by avoiding aggregations over message tables.
477-481: LGTM - Channel breakdown aggregation updated.The aggregation now uses the top-level
platformfield directly, which simplifies the query. The map construction at line 481 correctly usesplatformas the key.
80-80: Code still uses mixed query patterns for platform filtering - refactor incomplete.Line 80 and surrounding queries show inconsistent use of platform fields. While
ExperimentSession.platformfield was added via migration, queries still navigate through relationships in multiple places:
- Line 75:
.exclude(experiment_channel__platform=...)uses old pattern- Line 80:
chat__experiment_session__platform=...uses old pattern- Line 106:
sessions.filter(platform__in=...)correctly uses new field- Line 107:
messages.filter(chat__experiment_session__platform__in=...)still uses old patternUpdate all platform filtering to use the direct
platformfield onExperimentSessionfor consistency. Also verify that no signal handler is needed to sync platform ifexperiment_channelchanges after session creation, since the backfill command only populates initial values.Likely an incorrect or invalid review comment.
apps/participants/views.py (1)
48-48: LGTM - Query simplification with denormalized field.The removal of
annotate_with_last_message_created_at()aligns with the shift to using the denormalizedlast_activity_atfield onExperimentSession. This simplifies the query construction.However, please verify that
last_activity_atis properly updated whenever:
- Messages are created/modified in a session
- Sessions are updated through other means that affect activity timestamps
#!/bin/bash # Verify that last_activity_at is maintained when messages are saved # Search for signal handlers or save methods that update last_activity_at echo "Searching for last_activity_at update logic..." rg -n "last_activity_at\s*=" --type=py -A 3 -B 3 echo -e "\n\nSearching for message save signal handlers..." ast-grep --pattern 'def $FUNC($_, $_, **$_): $$$ last_activity_at $$$'
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! 🚀 New features to boost your workflow:
|
snopoke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Thanks.
I'd like to do some testing locally before merging.
Co-authored-by: Simon Kelly <[email protected]>
…alizations' into feature/2532-aggregations-demormalizations
snopoke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dtolentino-dimagi I added a few small commits.
Technical Description
I created this document outlining opportunities for future improvements I've identified during this development.
Demo
Docs and Changelog