forked from alan-turing-institute/sqlsynthgen
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Null-partitioned grouped generators make one overall query plus one query per partition. Partitions have one null pattern each. The non-nulls should have a k<n> (for category columns) or m<n> (for numeric columns) in each result row, and these should not be NULL (otherwise this row should be part of a different partition as it has a different null pattern than expected). Currently src_stats.yaml can look like this:
...
auto__cov__measurement__measurement_concept_id__alt_10:
comments:
- Number of rows for which value_as_number IS NULL and unit_concept_id IS NULL (for
each possible value of measurement_type_concept_id, measurement_concept_id, unit_source_value
and value_as_concept_id)
queries:
date: '2025-11-21 11:26:02'
query: SELECT 0 AS rank, _q.measurement_concept_id AS k0, _q.measurement_type_concept_id
AS k1, _q.unit_source_value AS k2, _q.value_as_concept_id AS k3, _q.count AS
count FROM (SELECT COUNT(*) AS count, measurement_concept_id, measurement_type_concept_id,
unit_source_value, value_as_concept_id FROM (SELECT * FROM measurement WHERE
unit_concept_id IS NULL AND value_as_number IS NULL ORDER BY RANDOM() LIMIT
500) AS _sampled GROUP BY measurement_concept_id, measurement_type_concept_id,
unit_source_value, value_as_concept_id) AS _q
results:
- count: 19
k0: 0
k1: 32856
k2: null
k3: null
rank: 0
- count: 2
k0: 3000348
k1: 32856
k2: null
k3: null
rank: 0
- count: 1
k0: 3000764
k1: 32856
k2: null
k3: null
rank: 0
...
- count: 4
k0: 3022318
k1: 32817
k2: null
k3: 21499419
rank: 0
- count: 73
k0: 3022318
k1: 32817
k2: null
k3: 45877096
rank: 0
...Note the k2: null and k3: null lines. These should not be possible.
Metadata
Metadata
Assignees
Labels
No labels