Skip to content

Conversation

@ericholscher
Copy link
Member

@ericholscher ericholscher commented Nov 20, 2025

Problem

This PR fixes database lock contention on the adserver_flight table that was causing queries like this to block for extended periods:

UPDATE "adserver_flight" SET "total_views" = ("adserver_flight"."total_views" + 1) WHERE "adserver_flight"."id" = 639

On high-traffic flights, multiple concurrent ad impressions were trying to update the same total_views and total_clicks fields, causing row-level locks that queued up and blocked ad serving.

Solution

Remove real-time updates to the denormalized Flight.total_views and Flight.total_clicks fields. These fields are now:

  1. Calculated on-demand from AdImpression when needed
  2. Refreshed periodically via a background Celery task (every 5-10m)

Since AdImpression is already the source of truth for this data, the denormalized fields on Flight don't need to be updated in real-time.

Changes

  • ✅ Remove F() expression updates in Advertisement.incr() that were causing row-level locks
  • ✅ Add Flight.refresh_denormalized_totals() method to recalculate fields from AdImpression
  • ✅ Add refresh_flight_denormalized_totals() Celery task for periodic background updates
  • ✅ Update Offer.refund() to skip Flight field updates
  • ✅ Update tests to manually refresh denormalized fields where needed

Deployment Notes

Schedule the refresh_flight_denormalized_totals task to run periodically (every 5-10m)

Testing

  • Updated existing test that relied on real-time updates
  • All other tests should pass as the fields are still readable and calculated correctly

Remove real-time updates to Flight.total_views and Flight.total_clicks
to eliminate database lock contention on high-traffic flights.

Changes:
- Remove F() expression updates in Advertisement.incr() that were
  causing row-level locks on every impression
- Add Flight.refresh_denormalized_totals() method to calculate
  these fields from AdImpression on-demand
- Add refresh_flight_denormalized_totals() Celery task for
  periodic background updates
- Add management command to manually refresh flight totals
- Update Offer.refund() to skip Flight field updates
- Update test to manually refresh denormalized fields

The denormalized fields are now calculated from the source of truth
(AdImpression) either on-demand or via scheduled tasks, avoiding
the lock contention that was blocking concurrent ad impressions.
@davidfischer
Copy link
Collaborator

This is a good starting point. I'd say we should get rid of the the management task (we'll never run it) and rely on the task. I think we should run the task every 5-10m instead of hourly.

@davidfischer
Copy link
Collaborator

The test failures are pretty straight-forward issues because we're assuming this field is updated.

ericholscher and others added 5 commits November 20, 2025 10:39
- Remove refresh_flight_totals management command (won't be used)
- Update task documentation to run every 5-10 minutes
- Fix all test failures by adding refresh_denormalized_totals() calls
  after incr() operations that check total_views/total_clicks
- Update tests that manually set total_clicks/total_views to use
  save(update_fields=[...]) to avoid triggering other updates
- Celery task sets cache timestamp on completion
- Health endpoint checks cache instead of querying DB
- Returns 503 if task hasn't run in 20+ minutes
- Refunding flights triggers recalc of totals
- Handle case where a flight is slightly overfilled
- Run denormalized calcs every 5m
- Don't bother with "error counts" in denormalized calcs.
  It will either succeed or error and alert us.
@davidfischer davidfischer marked this pull request as ready for review November 20, 2025 17:10
@davidfischer davidfischer requested a review from a team as a code owner November 20, 2025 17:10
@davidfischer davidfischer self-requested a review November 20, 2025 17:10
Copy link
Collaborator

@davidfischer davidfischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine to roll out as long as we also add monitoring to the health check. I like this pattern for health checks and think we should have more of them.

@ericholscher
Copy link
Member Author

Yea, I think it's a solid pattern.

@davidfischer davidfischer merged commit f795937 into main Nov 20, 2025
1 check passed
@davidfischer davidfischer deleted the fix/flight-denormalized-field-locking branch November 20, 2025 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants