Welcome to the Housing Bubble Detection & Forecasting project, built entirely on Google Cloud Platform (GCP) with full support for time series forecasting, macroeconomic risk scoring, and CI/CD automation. This project blends real-world economic theory with modern cloud-first data science to tackle a critical question:
Can we detect early signs of housing market bubbles before they burst?
Over the course of several weeks, I developed this project from scratch with careful attention to reproducibility, modularity, and economic realism. The goal was not just to build another ML model, but to simulate what a real analytics pipeline for real estate or financial policy might look like inside a large organization.
🌐 Live Streamlit App
Experience the working dashboard in real time! No setup needed — just click and explore bubble risk trends, macro indicators, and model forecasts.
🔁 The app auto-refreshes predictions and risk flags using GitHub Actions and BigQuery-backed pipelines.
The 2008 housing crisis showed us how delayed signals and over-optimistic valuations can wreak havoc. Today, with more data than ever, we can do better. This project aims to:
- Quantify bubble-like behavior using publicly available macroeconomic indicators
- Forecast future values of the Home Price Index (HPI) using interpretable models
- Deliver results via a dashboard and refreshable cloud pipeline for decision-makers
Unlike most toy projects that stop at modeling, this repo shows how to:
- Ingest economic data programmatically
- Clean, store, and version it securely in the cloud
- Create and monitor predictive models
- Automate everything with GitHub Actions
- Visualize signals for strategic storytelling
Pythonfor all modeling, cleaning, and pipeline scriptsGoogle Cloud Storage (GCS)for cleaned CSV uploadsBigQueryfor warehousing, modeling, and joining staging tablesStreamlitfor dashboarding (deployed via Streamlit Community Cloud)GitHub Actionsto orchestrate periodic refreshes and model retrainingFRED APIas the data source for 23 economic indicators (CPI, HPI, FEDFUNDS, etc.)
This is an end-to-end ML operations workflow, not just a static notebook.
Each stage of the project is implemented as an independent and reproducible component:
- Fetches macroeconomic indicators from FRED API into
data/raw/ - Cleans and standardizes files using
clean_fred_data.py - Uploads cleaned files to GCS bucket:
housing-bubble-predictor-data/cleaned_data
- All 23 cleaned CSVs are ingested as
stg_*tables in BigQuery - Final OBT (
table_obt_housing) is created with proper date formatting (e.g.,1974Q1), joins, and deduplication - View (
view_obt_housing) created for real-time dashboarding
market_predictor_bq.pyuses walk-forward validation withXGBoost,Random Forest, andDecision Tree- Feature engineering includes rolling stats, interaction terms, and policy event flags (rate hikes, inflation spikes)
- Outputs predictions + metrics (
RMSE,SMAPE,Adjusted R²) to BigQuery tables
bubble_detection_bq.pycomputes:- HPI acceleration, Affordability proxy
- Macro correlation breakdowns, Price z-scores, Sentiment flags
- Scores each quarter on speculative risk and flags realistic bubbles (e.g., 2005–2008 period)
streamlit_dashboard.pyshows:- Predicted vs Actual HPI (toggle between models)
- Forecast drift vs macro indicators
- Bubble score timeline and signal breakdown
- Includes explanatory tooltips, KPIs, and strategic commentary
FRED API ──► GitHub Actions (ETL) ──► GCS (Cleaned CSVs)
│
▼
BigQuery Staging Tables
▼
One Big Table (Quarterly HPI)
▼
┌────────────────────┬────────────────────┐
│ │ │
Model Training Bubble Scoring Streamlit App
(XGBoost, RF, DT) (Rule-based Logic) (3-tab Dashboard)
│
▼
BigQuery Tables: Predictions, Flags, KPIs
- ✅ End-to-end reproducibility: From ingestion to risk flag, everything is version-controlled and refreshable
- 📈 Economic realism: All modeling logic is benchmarked against actual 2006–2008 crash behavior
- ☁️ Cloud-native: No local dependencies — works entirely on GCP with secure credentials
- 🔁 Modular & scalable: Can easily extend to city-level HPI, Zillow/Redfin, or LSTM models
- ⚙️ CI/CD included: Every script is connected to a YAML, refreshable via cron
If you're reviewing this project from a hiring or academic lens, here’s what I’d like to highlight:
- This project blends domain understanding (macro econ) with ML & infra maturity
- Built to mirror real-world deployment — not just a Kaggle notebook
- Clean structure:
requirements.txt,.gitignore, secrets separation, auto-refresh YAMLs - Demonstrates awareness of model interpretability, policy events, economic indicators
- Easily extendable to fintech, proptech, policy analytics, or investment planning use cases
- Add SHAP value visualizations to dashboard for interpretability
- Extend bubble detection to MSA/ZIP-level granularity (Zillow/Redfin if possible)
- Trigger email/SMS alerts for high-risk quarters using GCP Pub/Sub
- Add Looker Studio connectors for enterprise reporting layer
- Try transformer-based models or
Prophetfor multi-horizon forecasting
Kapil Tare
M.S. Applied Data Science, Syracuse University
LinkedIn • GitHub
Always happy to chat about forecasting, cloud infra, or macroeconomic modeling.
Feel free to connect or collaborate!
This project is built for personal portfolio demonstration.
All data used is publicly available. Any insights or risk predictions should not be construed as financial advice. Always consult professional analysts or institutions before making economic decisions based on such models.
- FRED API for access to rich macroeconomic data
- Streamlit for simplifying interactive dashboards
- Google Cloud for robust infra and generous free tier
Thank you for visiting!
If you found this project insightful or want to collaborate — let’s connect 🚀
— Kapil