A fully automated, production-structured, local-first data engineering system powering real-time retail insights, monitoring, and alerting.
The Retail Sales Analytics & Monitoring Pipeline is an end-to-end data engineering solution that mimics how modern retail organizations collect, validate, transform, store, analyze, and monitor daily sales data.
Built entirely in Python and SQLite, it delivers:
- Automated ETL
- KPI-rich analytics
- FastAPI endpoints
- Streamlit dashboards
- Continuous monitoring
- Slack alerting on failures & anomalies
This project demonstrates real-world data engineering principles: modularity, observability, automation, idempotency, and business alignment — all locally, without cloud dependencies.
| Business Problem | Implemented Solution | Value Delivered |
|---|---|---|
| Scattered CSVs & messy data | Schema-validated ingestion | Reliable single source of truth |
| Manual daily reporting | APScheduler-based automation | Zero manual effort, daily refresh |
| Lack of visibility | Streamlit dashboards | Real-time business insights |
| Undetected data breaks | Monitoring + Slack alerts | Immediate detection of issues |
| No trend tracking | Rolling revenue averages | Consistent visibility into momentum |
┌──────────────────────────────────────────┐
│ Retail Sales Analytics Pipeline (ETL) │
└──────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────┐
│ Synthetic Data Generator (Faker CSVs) │
└──────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────┐
│ Ingestion Layer (schema validation) │
└──────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────┐
│ Transformation Layer (KPIs, enrichment) │
└──────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────┐
│ SQLite Warehouse (indexed tables) │
└──────────────────────────────────────────────┘
│ │
▼ ▼
┌───────────────────────────┐ ┌──────────────────────────────┐
│ FastAPI Service │ │ Streamlit Dashboard │
└───────────────────────────┘ └──────────────────────────────┘
│
▼
┌──────────────────────────────────────────────┐
│ APScheduler Automation (Daily ETL) │
└──────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────┐
│ Monitoring, Drift Detection & Slack Alerts │
└──────────────────────────────────────────────┘
retail_sales_pipeline/
│ README.md
│ requirements.txt
│ schema_config.json
│ run_all.py
│
├── dashboard/
│ streamlit_app.py
│
├── data/
│ ├── raw/
│ ├── ingested/
│ ├── processed/
│ └── product_catalog.csv
│
├── db/
│ retail_sales.db
│
├── images/
│ dashboard_kpis.png
│ dashboard_region_chart.png
│ dashboard_top_products.png
│ dashboard_summary.png
│ revenue_trends.png
│ monitoring_log.png
│ scheduler_run.png
│
├── logs/
│ extract_sales.log
│ generate_fake_sales.log
│ load_to_db.log
│ monitoring.log
│ pipeline_health.json
│ scheduler.log
│ transform_sales.log
│
├── scripts/
│ alerts.py
│ api_server.py
│ extract_sales.py
│ generate_fake_sales.py
│ transform_sales.py
│ load_to_db.py
│ monitoring.py
│ scheduler.py
│ __init__.py
│
└── tests/
Synth → Ingest → Transform → Load → Serve → Visualize → Monitor.
Strict ingestion validation via schema_config.json.
Revenue, cost, profit, margin%, grouped aggregations.
Indexed analytical tables for fast API and dashboard queries.
Exposes:
/health/kpi/revenue/kpi/top-products
Revenue trends, KPIs, regions, top products, rolling averages.
- Schema drift detection
- Row count anomalies
- Revenue deviation alerts
- Runtime monitoring
- Persistent monitoring table
Failure alerts + anomaly detection notifications.
pip install -r requirements.txtpython scripts/generate_fake_sales.py
python scripts/extract_sales.py
python scripts/transform_sales.py
python scripts/load_to_db.pyuvicorn scripts.api_server:app --reloadstreamlit run dashboard/streamlit_app.pypython scripts/scheduler.pyThis project demonstrates a complete, production-style retail analytics data platform built entirely with local, lightweight technologies.






