-
-
Notifications
You must be signed in to change notification settings - Fork 14
Open
Description
PyPI download stats are heavily skewed by CI action downloads. Their bigquery table does distinguish between these downloaded sources. E.g., a query that filters out CI downloads:
SELECT
COUNT(*) AS num_downloads,
DATE_TRUNC(DATE(timestamp), MONTH) AS `month`,
file.project AS `project`
FROM `bigquery-public-data.pypi.file_downloads`
WHERE
-- Only user downloads, not downloads as part of CI pipelines
details.ci is NULL
-- Only query the last 6 months of history
AND DATE(timestamp)
BETWEEN DATE_TRUNC(DATE_SUB(CURRENT_DATE(), INTERVAL 1 YEAR), MONTH)
AND CURRENT_DATE()
GROUP BY `month`, `project`
ORDER BY `month` DESCIt gets expensive to query the table quite quickly, sadly.
The same filtering is possible with the Julia package stats table, using the "client_type" columns.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels