-
Notifications
You must be signed in to change notification settings - Fork 324
Description
Summary:
While validating a public health dataset commonly consumed in analytical workflows, multiple columns were found to contain unexpected NULL values. This behavior can negatively affect BigQuery queries and downstream analytics that assume numeric completeness.
Dataset:
Public dataset consumed via BigQuery / public data pipeline
(Observed via JHU full COVID data snapshot)
Observed Issue:
The following fields contain NULL values in the dataset:
- reproduction_rate
- icu_patients
- icu_patients_per_million
These values are present as empty fields in the raw data and propagate into analytical workflows.
Reproduction:
- Access the dataset from the source URL:
https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/jhu/full_data.csv - Query or inspect the listed fields.
- Observe NULL values in rows where numeric values are expected.
Impact:
Queries that aggregate or compute over these fields may fail, return incomplete results, or require defensive NULL handling. This increases query complexity and risk of silent analytical errors.
Notes:
- No data was modified
- No private datasets were accessed
- Observation is based on read-only analysis of publicly available data
This issue is reported to improve reliability of public data usage in BigQuery-based analytics.