Skip to content

Null-partitioned grouped generators' query results overlap #73

@tim-band

Description

@tim-band

Null-partitioned grouped generators make one overall query plus one query per partition. Partitions have one null pattern each. The non-nulls should have a k<n> (for category columns) or m<n> (for numeric columns) in each result row, and these should not be NULL (otherwise this row should be part of a different partition as it has a different null pattern than expected). Currently src_stats.yaml can look like this:

...
auto__cov__measurement__measurement_concept_id__alt_10:
  comments:
  - Number of rows for which value_as_number IS NULL and unit_concept_id IS NULL (for
    each possible value of measurement_type_concept_id, measurement_concept_id, unit_source_value
    and value_as_concept_id)
  queries:
    date: '2025-11-21 11:26:02'
    query: SELECT 0 AS rank, _q.measurement_concept_id AS k0, _q.measurement_type_concept_id
      AS k1, _q.unit_source_value AS k2, _q.value_as_concept_id AS k3, _q.count AS
      count FROM (SELECT COUNT(*) AS count, measurement_concept_id, measurement_type_concept_id,
      unit_source_value, value_as_concept_id FROM (SELECT * FROM measurement WHERE
      unit_concept_id IS NULL AND value_as_number IS NULL ORDER BY RANDOM() LIMIT
      500) AS _sampled GROUP BY measurement_concept_id, measurement_type_concept_id,
      unit_source_value, value_as_concept_id) AS _q
  results:
  - count: 19
    k0: 0
    k1: 32856
    k2: null
    k3: null
    rank: 0
  - count: 2
    k0: 3000348
    k1: 32856
    k2: null
    k3: null
    rank: 0
  - count: 1
    k0: 3000764
    k1: 32856
    k2: null
    k3: null
    rank: 0
...
  - count: 4
    k0: 3022318
    k1: 32817
    k2: null
    k3: 21499419
    rank: 0
  - count: 73
    k0: 3022318
    k1: 32817
    k2: null
    k3: 45877096
    rank: 0
...

Note the k2: null and k3: null lines. These should not be possible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions