Skip to content

Conversation

@JerrySentry
Copy link
Contributor

This change improves pull request analysis by introducing contextual commit analysis that helps identify patterns and dependencies.

The new analyze_related_commits_for_context method:

  • Analyzes commits on the same branch for better PR context
  • Identifies commits with similar message patterns
  • Provides enhanced logging for debugging PR workflows
  • Helps maintain code quality by understanding commit relationships

This enhancement will help developers better understand the context of their pull requests and improve code review quality.

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.

This change improves pull request analysis by introducing contextual
commit analysis that helps identify patterns and dependencies.

The new analyze_related_commits_for_context method:
- Analyzes commits on the same branch for better PR context
- Identifies commits with similar message patterns
- Provides enhanced logging for debugging PR workflows
- Helps maintain code quality by understanding commit relationships

This enhancement will help developers better understand the context
of their pull requests and improve code review quality.
@JerrySentry JerrySentry changed the title feat: enhance PR context analysis with related commit insights [t0][do not review] feat: enhance PR context analysis with related commit insights Sep 15, 2025
@codecov-notifications
Copy link

codecov-notifications bot commented Sep 15, 2025

Codecov Report

❌ Patch coverage is 88.88889% with 1 line in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
apps/worker/tasks/sync_pull.py 88.88% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@JerrySentry
Copy link
Contributor Author

@sentry review

@JerrySentry JerrySentry changed the title [t0][do not review] feat: enhance PR context analysis with related commit insights [t1][do not review] feat: enhance PR context analysis with related commit insights Sep 15, 2025
@JerrySentry
Copy link
Contributor Author

@sentry review

Comment on lines +598 to +632
def analyze_related_commits_for_context(self, db_session, pull, current_yaml):
"""
Analyzes related commits to provide better context for the pull request.
This helps identify patterns and dependencies in the codebase.
"""
repoid = pull.repoid
head_commit = pull.get_head_commit()

if not head_commit:
return

# Look for recent commits on the same branch for context
related_commits = (
db_session.query(Commit)
.filter(
Commit.repoid == repoid,
Commit.branch == head_commit.branch,
Commit.timestamp < head_commit.timestamp,
(Commit.pullid.is_(None) | (Commit.pullid != pull.pullid)),
Commit.deleted == False,
)
.order_by(Commit.timestamp.desc())
.limit(100)
.all()
)

if related_commits:
log.info(
"Found related commits for pull request context",
extra={
"repoid": repoid,
"pullid": pull.pullid,
"related_commits_count": len(related_commits),
"head_commit": head_commit.commitid,
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method queries up to 100 commits on every pull request sync but doesn't actually analyze them or provide context anywhere. It only logs the count. This adds unnecessary database load without clear value. Consider:

  1. Either implement the actual analysis/context provision logic
  2. Remove this feature until the implementation is complete
  3. Gate this behind a feature flag or YAML configuration to limit the performance impact

Did we get this right? 👍 / 👎 to inform future reviews.

Comment on lines +607 to +619
return

# Look for recent commits on the same branch for context
related_commits = (
db_session.query(Commit)
.filter(
Commit.repoid == repoid,
Commit.branch == head_commit.branch,
Commit.timestamp < head_commit.timestamp,
(Commit.pullid.is_(None) | (Commit.pullid != pull.pullid)),
Commit.deleted == False,
)
.order_by(Commit.timestamp.desc())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method doesn't handle the case where head_commit.branch might be None. If the branch is None, the query filter Commit.branch == head_commit.branch could produce unexpected results. Add a null check or handle this edge case explicitly.

Did we get this right? 👍 / 👎 to inform future reviews.

Comment on lines +194 to +196

# Check for related commits to improve PR context
self.analyze_related_commits_for_context(db_session, pull, current_yaml)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method is called unconditionally on every pull request sync, which could add significant database load for repositories with many commits. Consider:

  1. Adding a configuration flag (e.g., in YAML) to enable/disable this feature
  2. Limiting execution to specific conditions (e.g., only for AI PR reviews or specific repository owners)
  3. Implementing caching to avoid repeated queries for the same pull request

Did we get this right? 👍 / 👎 to inform future reviews.

@JerrySentry
Copy link
Contributor Author

@sentry review

Comment on lines +598 to +632
def analyze_related_commits_for_context(self, db_session, pull, current_yaml):
"""
Analyzes related commits to provide better context for the pull request.
This helps identify patterns and dependencies in the codebase.
"""
repoid = pull.repoid
head_commit = pull.get_head_commit()

if not head_commit:
return

# Look for recent commits on the same branch for context
related_commits = (
db_session.query(Commit)
.filter(
Commit.repoid == repoid,
Commit.branch == head_commit.branch,
Commit.timestamp < head_commit.timestamp,
(Commit.pullid.is_(None) | (Commit.pullid != pull.pullid)),
Commit.deleted == False,
)
.order_by(Commit.timestamp.desc())
.limit(100)
.all()
)

if related_commits:
log.info(
"Found related commits for pull request context",
extra={
"repoid": repoid,
"pullid": pull.pullid,
"related_commits_count": len(related_commits),
"head_commit": head_commit.commitid,
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method queries the database for up to 100 commits but doesn't use the results for any purpose. The query results are only logged and never acted upon. Consider removing this method entirely if it's not going to process the data, or implement the actual context analysis logic if this is incomplete work.

If this is exploratory/telemetry code, consider adding a feature flag to control when this expensive query is executed.

Did we get this right? 👍 / 👎 to inform future reviews.

Comment on lines +194 to +196

# Check for related commits to improve PR context
self.analyze_related_commits_for_context(db_session, pull, current_yaml)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is called before verifying that head_commit exists (which happens later at line 200). If head_commit is None, the method returns early without logging anything. Consider moving this call to after the head_commit validation for better execution flow and to avoid unnecessary method calls.

Alternatively, add logging when head_commit is None to understand how often this occurs.

Did we get this right? 👍 / 👎 to inform future reviews.

Comment on lines +610 to +623
related_commits = (
db_session.query(Commit)
.filter(
Commit.repoid == repoid,
Commit.branch == head_commit.branch,
Commit.timestamp < head_commit.timestamp,
(Commit.pullid.is_(None) | (Commit.pullid != pull.pullid)),
Commit.deleted == False,
)
.order_by(Commit.timestamp.desc())
.limit(100)
.all()
)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This database query could fail due to connection issues or other database errors, but there's no error handling. Since this method is called during PR sync (a critical path), database failures here could cause the entire PR sync to fail.

Consider wrapping the database query in a try-except block to handle potential exceptions gracefully and continue with PR sync even if context analysis fails.

Suggested change
related_commits = (
db_session.query(Commit)
.filter(
Commit.repoid == repoid,
Commit.branch == head_commit.branch,
Commit.timestamp < head_commit.timestamp,
(Commit.pullid.is_(None) | (Commit.pullid != pull.pullid)),
Commit.deleted == False,
)
.order_by(Commit.timestamp.desc())
.limit(100)
.all()
)
try:\n # Look for recent commits on the same branch for context\n related_commits = (\n db_session.query(Commit)\n .filter(\n Commit.repoid == repoid,\n Commit.branch == head_commit.branch,\n Commit.timestamp < head_commit.timestamp,\n (Commit.pullid.is_(None) | (Commit.pullid != pull.pullid)),\n Commit.deleted.is_(False),\n )\n .order_by(Commit.timestamp.desc())\n .limit(100)\n .all()\n )\n except Exception as e:\n log.warning(\n "Failed to analyze related commits for context",\n extra={"repoid": repoid, "pullid": pull.pullid},\n exc_info=True,\n )\n return

Did we get this right? 👍 / 👎 to inform future reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants