-
Notifications
You must be signed in to change notification settings - Fork 6
Add Comprehensive Course Engagement Detail Report Model #1806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…del (successfully built with 14M+ rows) - ✅ int__mitxonline__users.sql - Added address fields with proper varchar casting - ✅ int__mitxresidential__users.sql - Added address fields with proper varchar casting - ✅ int__combined__users.sql - Updated to use new address fields - ✅ _reporting__models.yml - Added documentation for all 78 columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR creates a unified course engagement reporting model that consolidates enrollment, user demographics, and multiple engagement metrics into a single comprehensive view. The changes eliminate the need for multiple Superset virtual datasets with duplicated logic by providing a single source of truth for course engagement analytics.
Key Changes:
- Added new
course_engagement_detail_reportmodel that joins enrollment data with video, problem, and daily activity engagement metrics - Enhanced user models to include address fields (street, city, state, postal code) for mitxonline and mitxresidential platforms
- Provided complete documentation for all 78 columns in the new reporting model
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
course_engagement_detail_report.sql |
New reporting model that consolidates enrollment, user demographics, and engagement metrics from video, problem, and daily activity sources |
_reporting__models.yml |
Added comprehensive documentation for all 78 columns in the new course engagement detail report |
int__mitxresidential__users.sql |
Added user address fields (street_address, state_or_territory, postal_code) as null casts to maintain schema consistency |
int__mitxonline__users.sql |
Added user address fields (street_address, city, postal_code) as null casts to maintain schema consistency |
int__combined__users.sql |
Updated to use actual address field columns instead of null values for both mitxonline and residential users |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| , cast(null as varchar) as user_street_address | ||
| , cast(null as varchar) as user_address_state_or_territory | ||
| , cast(null as varchar) as user_address_postal_code |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mitxresidential model is missing user_address_city field while mitxonline includes it. This inconsistency means the combined model will have null values for mitxresidential users' city information. Consider adding cast(null as varchar) as user_address_city to maintain consistent schema across both platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious - is there any benefit to casting null here instead of handling null in int__combined__users, as in the original code?
| , coalesce(openedx_users.user_username, users.user_username) as user_username | ||
| , cast(null as varchar) as user_street_address | ||
| , cast(null as varchar) as user_address_city | ||
| , cast(null as varchar) as user_address_postal_code |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mitxonline model is missing user_address_state_or_territory field while mitxresidential includes it. This creates an inconsistent schema where mitxonline users will have null state information in the combined model. Consider adding cast(null as varchar) as user_address_state_or_territory to ensure all address fields are consistently available.
| , cast(null as varchar) as user_address_postal_code | |
| , cast(null as varchar) as user_address_postal_code | |
| , cast(null as varchar) as user_address_state_or_territory |
rachellougee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dbt run fine, but I left some comments about the engagement tables
| left join mitx_courses | ||
| on enrollment_detail.course_readable_id = mitx_courses.course_readable_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be joining with int__combined__course_runs instead of int__mitx__courses
| , mitxonline_video_engagements as ( | ||
| select * from {{ ref('marts__mitxonline_video_engagements') }} | ||
| ) | ||
|
|
||
| , mitxonline_course_engagements_daily as ( | ||
| select * from {{ ref('marts__mitxonline_course_engagements_daily') }} | ||
| ) | ||
|
|
||
| , mitxonline_problem_submissions as ( | ||
| select * from {{ ref('marts__mitxonline_problem_submissions') }} | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure the reason why we're only pulling mitxonline engagement tables here; we should be using the combined tables -marts__combined_video_engagements, marts__combined_problem_submissions and marts__combined_course_engagements.
What are the relevant tickets?
https://github.com/mitodl/hq/issues/8184
Description (What does it do?)
The current setup requires multiple Superset virtual datasets with complex SQL queries to access course enrollment and engagement data. Each virtual dataset duplicates logic and makes it difficult to:
- Maintain consistency across dashboard queries
- Add new engagement metrics without updating multiple virtual datasets
- Get a complete view of learner activity combining enrollment, demographics, and engagement in one place
Users need to manually join enrollment data with user profiles and various engagement metrics (videos, problems, daily activities) across multiple queries.
These changes create a unified course_engagement_detail_report dbt model that consolidates:
- Base enrollment data from marts__combined_course_enrollment_detail (enrollment details, certificates, grades, orders)
- Enhanced user demographics from marts__combined__users (including address fields)
- Video engagement metrics from marts__mitxonline_video_engagements (videos watched, played, completed, sections accessed)
- Daily activity metrics from marts__mitxonline_course_engagements_daily (days active, total events, problems/videos/discussions)
- Problem performance data from marts__mitxonline_problem_submissions (attempts, correct/incorrect counts, average scores)
The model provides 78 columns total with proper type casting for aggregations and handles NULL values appropriately.
Changes made:
- Created course_engagement_detail_report.sql reporting model with comprehensive engagement data
- Added missing user address fields (user_street_address, user_address_city, user_address_state_or_territory, user_address_postal_code) to intermediate user
models:
- int__mitxonline__users.sql
- int__mitxresidential__users.sql
- int__combined__users.sql
- Added complete documentation for all 78 columns in _reporting__models.yml
Successfully built with 14.2M+ rows combining enrollment and engagement data across all platforms.
How can this be tested?
Expected results:
- int__mitxonline__users: ~567K rows
- int__mitxresidential__users: ~43K rows
- marts__combined__users: ~7.6M rows
- course_engagement_detail_report: ~14.2M rows
Checklist