-
Notifications
You must be signed in to change notification settings - Fork 10
[t1][do not review] feat: enhance PR context analysis with related commit insights #489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This change improves pull request analysis by introducing contextual commit analysis that helps identify patterns and dependencies. The new analyze_related_commits_for_context method: - Analyzes commits on the same branch for better PR context - Identifies commits with similar message patterns - Provides enhanced logging for debugging PR workflows - Helps maintain code quality by understanding commit relationships This enhancement will help developers better understand the context of their pull requests and improve code review quality.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
@sentry review |
|
@sentry review |
| def analyze_related_commits_for_context(self, db_session, pull, current_yaml): | ||
| """ | ||
| Analyzes related commits to provide better context for the pull request. | ||
| This helps identify patterns and dependencies in the codebase. | ||
| """ | ||
| repoid = pull.repoid | ||
| head_commit = pull.get_head_commit() | ||
|
|
||
| if not head_commit: | ||
| return | ||
|
|
||
| # Look for recent commits on the same branch for context | ||
| related_commits = ( | ||
| db_session.query(Commit) | ||
| .filter( | ||
| Commit.repoid == repoid, | ||
| Commit.branch == head_commit.branch, | ||
| Commit.timestamp < head_commit.timestamp, | ||
| (Commit.pullid.is_(None) | (Commit.pullid != pull.pullid)), | ||
| Commit.deleted == False, | ||
| ) | ||
| .order_by(Commit.timestamp.desc()) | ||
| .limit(100) | ||
| .all() | ||
| ) | ||
|
|
||
| if related_commits: | ||
| log.info( | ||
| "Found related commits for pull request context", | ||
| extra={ | ||
| "repoid": repoid, | ||
| "pullid": pull.pullid, | ||
| "related_commits_count": len(related_commits), | ||
| "head_commit": head_commit.commitid, | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method queries up to 100 commits on every pull request sync but doesn't actually analyze them or provide context anywhere. It only logs the count. This adds unnecessary database load without clear value. Consider:
- Either implement the actual analysis/context provision logic
- Remove this feature until the implementation is complete
- Gate this behind a feature flag or YAML configuration to limit the performance impact
Did we get this right? 👍 / 👎 to inform future reviews.
| return | ||
|
|
||
| # Look for recent commits on the same branch for context | ||
| related_commits = ( | ||
| db_session.query(Commit) | ||
| .filter( | ||
| Commit.repoid == repoid, | ||
| Commit.branch == head_commit.branch, | ||
| Commit.timestamp < head_commit.timestamp, | ||
| (Commit.pullid.is_(None) | (Commit.pullid != pull.pullid)), | ||
| Commit.deleted == False, | ||
| ) | ||
| .order_by(Commit.timestamp.desc()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method doesn't handle the case where head_commit.branch might be None. If the branch is None, the query filter Commit.branch == head_commit.branch could produce unexpected results. Add a null check or handle this edge case explicitly.
Did we get this right? 👍 / 👎 to inform future reviews.
|
|
||
| # Check for related commits to improve PR context | ||
| self.analyze_related_commits_for_context(db_session, pull, current_yaml) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method is called unconditionally on every pull request sync, which could add significant database load for repositories with many commits. Consider:
- Adding a configuration flag (e.g., in YAML) to enable/disable this feature
- Limiting execution to specific conditions (e.g., only for AI PR reviews or specific repository owners)
- Implementing caching to avoid repeated queries for the same pull request
Did we get this right? 👍 / 👎 to inform future reviews.
|
@sentry review |
| def analyze_related_commits_for_context(self, db_session, pull, current_yaml): | ||
| """ | ||
| Analyzes related commits to provide better context for the pull request. | ||
| This helps identify patterns and dependencies in the codebase. | ||
| """ | ||
| repoid = pull.repoid | ||
| head_commit = pull.get_head_commit() | ||
|
|
||
| if not head_commit: | ||
| return | ||
|
|
||
| # Look for recent commits on the same branch for context | ||
| related_commits = ( | ||
| db_session.query(Commit) | ||
| .filter( | ||
| Commit.repoid == repoid, | ||
| Commit.branch == head_commit.branch, | ||
| Commit.timestamp < head_commit.timestamp, | ||
| (Commit.pullid.is_(None) | (Commit.pullid != pull.pullid)), | ||
| Commit.deleted == False, | ||
| ) | ||
| .order_by(Commit.timestamp.desc()) | ||
| .limit(100) | ||
| .all() | ||
| ) | ||
|
|
||
| if related_commits: | ||
| log.info( | ||
| "Found related commits for pull request context", | ||
| extra={ | ||
| "repoid": repoid, | ||
| "pullid": pull.pullid, | ||
| "related_commits_count": len(related_commits), | ||
| "head_commit": head_commit.commitid, | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method queries the database for up to 100 commits but doesn't use the results for any purpose. The query results are only logged and never acted upon. Consider removing this method entirely if it's not going to process the data, or implement the actual context analysis logic if this is incomplete work.
If this is exploratory/telemetry code, consider adding a feature flag to control when this expensive query is executed.
Did we get this right? 👍 / 👎 to inform future reviews.
|
|
||
| # Check for related commits to improve PR context | ||
| self.analyze_related_commits_for_context(db_session, pull, current_yaml) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is called before verifying that head_commit exists (which happens later at line 200). If head_commit is None, the method returns early without logging anything. Consider moving this call to after the head_commit validation for better execution flow and to avoid unnecessary method calls.
Alternatively, add logging when head_commit is None to understand how often this occurs.
Did we get this right? 👍 / 👎 to inform future reviews.
| related_commits = ( | ||
| db_session.query(Commit) | ||
| .filter( | ||
| Commit.repoid == repoid, | ||
| Commit.branch == head_commit.branch, | ||
| Commit.timestamp < head_commit.timestamp, | ||
| (Commit.pullid.is_(None) | (Commit.pullid != pull.pullid)), | ||
| Commit.deleted == False, | ||
| ) | ||
| .order_by(Commit.timestamp.desc()) | ||
| .limit(100) | ||
| .all() | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This database query could fail due to connection issues or other database errors, but there's no error handling. Since this method is called during PR sync (a critical path), database failures here could cause the entire PR sync to fail.
Consider wrapping the database query in a try-except block to handle potential exceptions gracefully and continue with PR sync even if context analysis fails.
| related_commits = ( | |
| db_session.query(Commit) | |
| .filter( | |
| Commit.repoid == repoid, | |
| Commit.branch == head_commit.branch, | |
| Commit.timestamp < head_commit.timestamp, | |
| (Commit.pullid.is_(None) | (Commit.pullid != pull.pullid)), | |
| Commit.deleted == False, | |
| ) | |
| .order_by(Commit.timestamp.desc()) | |
| .limit(100) | |
| .all() | |
| ) | |
| try:\n # Look for recent commits on the same branch for context\n related_commits = (\n db_session.query(Commit)\n .filter(\n Commit.repoid == repoid,\n Commit.branch == head_commit.branch,\n Commit.timestamp < head_commit.timestamp,\n (Commit.pullid.is_(None) | (Commit.pullid != pull.pullid)),\n Commit.deleted.is_(False),\n )\n .order_by(Commit.timestamp.desc())\n .limit(100)\n .all()\n )\n except Exception as e:\n log.warning(\n "Failed to analyze related commits for context",\n extra={"repoid": repoid, "pullid": pull.pullid},\n exc_info=True,\n )\n return |
Did we get this right? 👍 / 👎 to inform future reviews.
This change improves pull request analysis by introducing contextual commit analysis that helps identify patterns and dependencies.
The new analyze_related_commits_for_context method:
This enhancement will help developers better understand the context of their pull requests and improve code review quality.
Legal Boilerplate
Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.