Skip to content

perf: skip DISTINCT ON for enrolledAt order on onlyEnrollOnce programs DHIS2-20863#22953

Merged
teleivo merged 1 commit intomasterfrom
DHIS2-20863-skip-distinct-on
Feb 13, 2026
Merged

perf: skip DISTINCT ON for enrolledAt order on onlyEnrollOnce programs DHIS2-20863#22953
teleivo merged 1 commit intomasterfrom
DHIS2-20863-skip-distinct-on

Conversation

@teleivo
Copy link
Contributor

@teleivo teleivo commented Feb 12, 2026

When ordering tracked entities by enrolledAt, the query uses DISTINCT ON (trackedentityid) to deduplicate rows from the enrollment JOIN (a TE can have multiple enrollments in the same program). This requires sorting all enrollment rows before applying LIMIT, which can be millions.

Programs with onlyEnrollOnce = true guarantee at most one enrollment per TE, making DISTINCT ON unnecessary. Skipping it lets PostgreSQL apply LIMIT early via a simple JOIN instead of sorting everything first.

Also consolidates duplicate event filter logic: addEventExistsForEnrollmentJoin and addEventFilter shared the same conditions (event status, program stage, assigned user, deleted). Extracted into a single addEventFilterConditions method used by both the EXISTS subquery path and the enrollment JOIN path.

When does this apply?

Only when all three conditions are met:

  1. order=enrolledAt is requested
  2. A program is specified (already required for order=enrolledAt)
  3. The program has onlyEnrollOnce = true

onlyEnrollOnce programs are fairly common in practice -- child programmes, immunization programs and similar where a person is enrolled once but can have many repeatable stages. Programs with onlyEnrollOnce = false are unaffected and still use DISTINCT ON.

SQL

Before (DISTINCT ON, two sorts):

select te.trackedentityid, te.uid, en_enrollmentdate
from (
    select distinct on (te.trackedentityid)
        te.trackedentityid, te.uid, en.enrollmentdate as en_enrollmentdate
    from trackedentity te
    inner join trackedentityprogramowner po on ...
    inner join organisationunit ou on ...
    inner join enrollment en on en.trackedentityid = te.trackedentityid
        and en.programid = :programId and en.deleted is false
    where te.trackedentitytypeid = :tetId and te.deleted is false
    order by te.trackedentityid, en.enrollmentdate desc
) te
order by en_enrollmentdate desc, te.trackedentityid desc
limit 4

After (plain JOIN, single sort, LIMIT pushdown):

select te.trackedentityid, te.uid, en.enrollmentdate as en_enrollmentdate
from trackedentity te
inner join trackedentityprogramowner po on ...
inner join organisationunit ou on ...
inner join enrollment en on en.trackedentityid = te.trackedentityid
    and en.programid = :programId and en.deleted is false
where te.trackedentitytypeid = :tetId and te.deleted is false
order by en.enrollmentdate desc, te.trackedentityid desc
limit 4

Database

Sierra Leone DB with 10M tracked entities (10.9M enrollments). EXPLAIN ANALYZE, 4 warmup runs.

Metric Before (DISTINCT ON) After (plain JOIN) Change
Execution time 11,977ms 2,291ms -81%
Temp I/O (read+write) 511,100 blocks 39,420 blocks -92%

DISTINCT ON forces two full sorts spilling to disk: first sort 10.9M enrollment rows by
trackedentityid for deduplication, then re-sort the deduplicated result by enrollmentdate.
Without it, PG uses nested loop index lookups with a single sort and can stop early via LIMIT.

@teleivo teleivo marked this pull request as ready for review February 13, 2026 08:56
@sonarqubecloud
Copy link

sonarqubecloud bot commented Feb 13, 2026

@teleivo teleivo enabled auto-merge (squash) February 13, 2026 09:44
@teleivo teleivo merged commit 6207187 into master Feb 13, 2026
16 checks passed
@teleivo teleivo deleted the DHIS2-20863-skip-distinct-on branch February 13, 2026 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants