Dataset view of questions
This repository is available online at https://github.com/mysociety/uk-parl-written-questions
If Github Pages are enabled, the URL is: https://mysociety.github.io/uk-parl-written-questions/
To avoid GitHub's repository size limits, this project uses a two-stage data storage approach:
- Raw data: Monthly JSON files are stored in
data/raw/commons/and committed to git for efficient caching - Processed datasets: Large parquet files (
written_questions.parquetandwritten_questions_interests.parquet) are generated during the build process and not committed to git
To build the datasets locally:
# Fetch latest data (updates recent months)
python -m src.uk_parl_written_questions fetch
# Generate parquet files from raw data
python -m src.uk_parl_written_questions createThe generated parquet files will be created in:
data/packages/commons_written_questions/written_questions.parquetdata/packages/commons_written_questions_interests/written_questions_interests.parquet
The GitHub Actions workflow automatically:
- Fetches the latest data
- Generates the parquet files
- Builds and publishes the website
Instructions on using the features of this notebook (data publishing, notebook rendering, Github Pages) are available in [https://github.com/mysociety/data_common/blob/main/data-repo-readme.md](Data Common readme file).