This case study is contained within the Serious SQL by Danny Ma
- 🛠️ Overview
- 🚀 Solutions
- 💻 Key Highlights
With the Health Analytics Mini Case Study, I queried data to bring insights to the following questions:
- How many
unique usersexist in the logs dataset? - How many total
measurementsdo we haveper user on average? - What about the
mediannumber of measurements per user? - How many users have
3 or moremeasurements? - How many users have
1,000 or moremeasurements? - Have logged
blood glucosemeasurements? - Have
at least 2 typesof measurements? - Have all 3 measures -
blood glucose, weight and blood pressure? - What is the
median systolic/diastolicblood pressure values?
SELECT COUNT (DISTINCT id)
FROM health.user_logs;| count |
|---|
| 554 |
Note: For question 2-8, I created a temporary table:
Step 1: Firstly, I ran a code DROP TABLE IF EXISTS statement to clear out any previously created tables:
DROP TABLE IF EXISTS user_measure_count;Step 2: Next, I created a new temporary table using the results of the query below:
CREATE TEMP TABLE user_measure_count AS
SELECT
id,
COUNT(*) AS measure_count,
COUNT (DISTINCT measure) AS unique_measures
FROM health.user_logs
GROUP BY 1;SELECT
ROUND (AVG(measure_count), 2) AS mean_value
FROM user_measure_count;| mean_value |
|---|
| 79.23 |
SELECT
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY measure_count) AS median_value
FROM user_measure_count;| median_value |
|---|
| 2 |
SELECT COUNT(*)
FROM user_measure_count
WHERE measure_count >= 3;| count |
|---|
| 209 |
SELECT COUNT(*)
FROM user_measure_count
WHERE measure_count >= 1000;| count |
|---|
| 5 |
SELECT
COUNT(DISTINCT id)
FROM health.user_logs
WHERE measure = 'blood_glucose';| count |
|---|
| 325 |
SELECT
COUNT(*)
FROM user_measure_count
WHERE unique_measures >= 2;| count |
|---|
| 204 |
SELECT
COUNT(*)
FROM user_measure_count
WHERE unique_measures = 3;| count |
|---|
| 50 |
SELECT
PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY systolic) AS median_systolic,
PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY diastolic) AS median_diastolic
FROM health.user_logs
WHERE measure = 'blood_pressure';| median_systolic | median_diastolic |
|---|---|
| 126 | 79 |
Even though this is a short assignment which cover basic SQL syntax, I did run into problems several time during the solving process. However, it helped me to have a better understanding about data exploration using SQL from theories to real life application.