Skip to content

Enterprise Data Warehouse with Star Schema, SCD Type 2, ETL pipelines, and multi-currency analytics (SQL Server + Python)

License

Notifications You must be signed in to change notification settings

rAmIro-89/retail-sales-data-warehouse-sql-refactored

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“Š Retail Sales Data Warehouse β€” SQL & Python

SQL Server Python Jupyter License

Enterprise-grade Data Warehouse built with SQL Server, Python, and dimensional modeling (Kimball). Includes OLTP modeling, ETL pipelines, SCD Type 2, multi-currency support, and analytical SQL/Python workflows.


πŸ”· 1. Business Context

This project simulates the analytical backbone of a retail cellphone company.

βœ” Strategic Objectives

  • Sales performance analysis across stores, brands, models, channels
  • RFM segmentation & customer lifecycle insights
  • ABC/Pareto analysis for product profitability
  • Seller performance tracking (Top / Medium / Low)
  • Multi-currency reporting: ARS, USD, EUR, BRL, CNY
  • BI-ready datasets for Power BI & Python

This architecture is representative of a real Canadian retail BI system.


πŸ”· 2. Architecture Overview

OLTP (Normalized 3NF)
      ↓ ETL
Data Warehouse (Star Schema)
      ↓
SQL Analytics + Python + Power BI

πŸ”· 3. OLTP Model β€” Transactional Layer

Operational data model used as the source system for the DW.

Features:

  • βœ” 9 normalized tables
  • βœ” Referential integrity enforced
  • βœ” Real-time operations
  • βœ” Source for ETL extraction

πŸ–Ό ERD

OLTP Model - Transactional Layer

Figure 1: Normalized OLTP schema with referential integrity constraints.


πŸ”· 4. Star Schema β€” Analytical Data Warehouse

Dimensional model optimized for BI, KPIs, aggregations and advanced analysis.

Dimensions

  • DimDate β€” Date dimension with temporal attributes
  • DimCustomer β€” Customer profiles
  • DimProduct β€” Products (brand, model, specs)
  • DimStore β€” Store locations
  • DimChannel β€” Sales channels (Web / Store)
  • DimCurrency β€” Multi-currency support
  • DimPaymentMethod β€” Payment methods
  • DimSeller β€” Sellers with SCD Type 2 versioning

Fact Table

  • FactSales β€” Quantities, revenue, margins, multi-currency metrics

πŸ–Ό ERD

Star Schema - Data Warehouse

Figure 2: Star schema with fact table and dimension tables optimized for analytical queries.


πŸ”· 5. ETL Pipeline & Dimensions

Daily automated ETL ensuring fresh, consistent analytical data.

Pipeline

  1. Extract from OLTP
  2. Transform (business rules, currency conversion)
  3. SCD Type 2 for sellers (versioning)
  4. Load DW tables (facts & dimensions)

Key Dimensions

  • DimDate: 10-year temporal attributes (year, month, quarter, weekday)
  • DimChannel: Web / Store / Online
  • DimCurrency: ARS, USD, EUR, BRL, CNY

πŸ–Ό ETL Flow Diagram

Dimensions and ETL Flow

Figure 3: ETL pipeline with dimension population and incremental load strategy.


πŸ”· 6. Advanced Analytics

Implemented in SQL and Python.

βœ” Temporal Analytics

  • YoY / MoM growth rates
  • Moving averages
  • Running totals
  • Trend analysis

βœ” Segmentation

  • ABC / Pareto analysis
  • RFM segmentation (Champions, Loyal, At Risk, Lost)

βœ” Performance

  • Best seller by revenue
  • Best store by profit margin
  • Top brand/model

βœ” Multi-Currency

  • Correct conversion with historical exchange rates
  • Aggregation-safe metrics
  • Multi-currency reporting (ARS, USD, EUR, BRL, CNY)

πŸ”· 7. Project Structure

retail-sales-data-warehouse-sql-refactored/
β”œβ”€β”€ 01_base_datos/          # Database creation scripts
β”œβ”€β”€ 02_oltp/                # OLTP DDL and data load
β”œβ”€β”€ 03_datawarehouse/       # DW schema definitions
β”œβ”€β”€ 04_etl/                 # ETL initial and incremental
β”œβ”€β”€ 05_consultas/           # Analytical SQL queries
β”œβ”€β”€ 06_analisis/            # Jupyter notebooks
β”œβ”€β”€ 07_validacion/          # Data quality validation
β”œβ”€β”€ 08_scripts_auxiliares/  # Helper scripts
β”œβ”€β”€ 09_documentacion/       # Documentation
β”œβ”€β”€ sql/                    # Organized SQL scripts
β”‚   β”œβ”€β”€ ddl/
β”‚   β”œβ”€β”€ dml/
β”‚   └── views/
β”œβ”€β”€ src/                    # Python modules
β”‚   β”œβ”€β”€ etl/
β”‚   └── utils/
β”œβ”€β”€ notebooks/              # Analysis notebooks
β”œβ”€β”€ data/                   # Raw and processed data
β”œβ”€β”€ img/                    # Architecture diagrams
└── docs/                   # Technical documentation

πŸ”· 8. How to Run

πŸ“‹ Prerequisites

  • SQL Server 2017+ (Developer or Express Edition)
  • Python 3.9+
  • SSMS or Azure Data Studio (optional)

πŸš€ SQL Server Setup

-- 1. Create databases
:r 01_base_datos/00_creacion_bases.sql

-- 2. OLTP schema
:r 02_oltp/01_ddl_oltp.sql

-- 3. Load sample data
:r 02_oltp/02_carga_oltp.sql

-- 4. DW schema
:r 03_datawarehouse/03_ddl_dw.sql

-- 5. Initial ETL
:r 04_etl/04_etl_dw_inicial.sql

πŸ”„ Incremental ETL

-- Daily refresh
:r 04_etl/05_reproceso_diario.sql

🐍 Python Setup

# Install dependencies
pip install -r requirements.txt

# Launch Jupyter
jupyter notebook notebooks/Notebook_Estadistica_Ventas.ipynb

πŸ“Š Run Analytical Queries

-- Execute any query from 05_consultas/
:r 05_consultas/09_analisis_abc_pareto.sql
:r 05_consultas/10_analisis_rfm.sql

πŸ”· 9. Technologies

Category Tools
Database SQL Server 2017+
Languages T-SQL, Python 3.9+
Python Stack Pandas, SQLAlchemy, Matplotlib, Seaborn, Plotly
BI Tools Power BI, Jupyter Notebooks
IDEs VS Code, SSMS, Azure Data Studio
Version Control Git, GitHub

πŸ”· 10. Key Features Summary

  • βœ… OLTP β†’ DW full pipeline with initial and incremental ETL
  • βœ… SCD Type 2 for seller dimension versioning
  • βœ… Multi-currency support with historical exchange rates
  • βœ… Star schema optimized for analytical queries
  • βœ… Advanced analytics: ABC/Pareto, RFM, temporal analysis
  • βœ… Python integration: Jupyter notebooks with visualization
  • βœ… Production-ready: Data quality validation, helper scripts

πŸ“ License

MIT License β€” see LICENSE for details.

πŸ‘€ Author

Ramiro Ottone Villar

GitHub LinkedIn


Built with ❀️ for Canadian tech market β€” showcasing enterprise-grade data engineering and analytics skills.

About

Enterprise Data Warehouse with Star Schema, SCD Type 2, ETL pipelines, and multi-currency analytics (SQL Server + Python)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published