Complete Data Analyst Portfolio Project with end-to-end SQL Data Analysis of Zepto E-commerce Inventory data using PostgreSQL.
This is a complete, real-world data analyst portfolio project based on an e-commerce inventory dataset scraped from Zepto — one of India’s fastest-growing quick-commerce startups. This project simulates real analyst workflows, from raw data exploration to business-focused data analysis.
This project is perfect for:
📊 Data Analyst aspirants who want to build a strong Portfolio Project for interviews and LinkedIn 📚 Anyone learning SQL hands-on 💼 Preparing for interviews in retail, e-commerce, or product analytics
The goal is to simulate how actual data analysts in the e-commerce or retail industries work behind the scenes to use SQL to:
✅ Set up a messy, real-world e-commerce inventory database
✅ Perform Exploratory Data Analysis (EDA) to explore product categories, availability, and pricing inconsistencies
✅ Implement Data Cleaning to handle null values, remove invalid entries, and convert pricing from paise to rupees
✅ Write business-driven SQL queries to derive insights around pricing, inventory, stock availability, revenue and more
The dataset was sourced from Kaggle and was originally scraped from Zepto’s official product listings. It mimics what you’d typically encounter in a real-world e-commerce inventory system.
Each row represents a unique SKU (Stock Keeping Unit) for a product. Duplicate product names exist because the same product may appear multiple times in different package sizes, weights, discounts, or categories to improve visibility – exactly how real catalog data looks.
🧾 Columns:
sku_id: Unique identifier for each product entry (Synthetic Primary Key)
name: Product name as it appears on the app
category: Product category like Fruits, Snacks, Beverages, etc.
mrp: Maximum Retail Price (originally in paise, converted to ₹)
discountPercent: Discount applied on MRP
discountedSellingPrice: Final price after discount (also converted to ₹)
availableQuantity: Units available in inventory
weightInGms: Product weight in grams
outOfStock: Boolean flag indicating stock availability
quantity: Number of units per package (mixed with grams for loose produce)
Here’s a step-by-step breakdown of what we do in this project:
We start by creating a SQL table with appropriate data types:
CREATE TABLE zepto ( sku_id SERIAL PRIMARY KEY, category VARCHAR(120), name VARCHAR(150) NOT NULL, mrp NUMERIC(8,2), discountPercent NUMERIC(5,2), availableQuantity INTEGER, discountedSellingPrice NUMERIC(8,2), weightInGms INTEGER, outOfStock BOOLEAN, quantity INTEGER );
Loaded CSV using pgAdmin's import feature.
If you're not able to use the import feature, write this code instead:
\copy zepto(category,name,mrp,discountPercent,availableQuantity, discountedSellingPrice,weightInGms,outOfStock,quantity) FROM 'data/zepto_v2.csv' WITH (FORMAT csv, HEADER true, DELIMITER ',', QUOTE '"', ENCODING 'UTF8'); Faced encoding issues (UTF-8 error), which were fixed by saving the CSV file using CSV UTF-8 format.
Counted the total number of records in the dataset
Viewed a sample of the dataset to understand structure and content
Checked for null values across all columns
Identified distinct product categories available in the dataset
Compared in-stock vs out-of-stock product counts
Detected products present multiple times, representing different SKUs
Identified and removed rows where MRP or discounted selling price was zero
Converted mrp and discountedSellingPrice from paise to rupees for consistency and readability
Found top 10 best-value products based on discount percentage
Identified high-MRP products that are currently out of stock
Estimated potential revenue for each product category
Filtered expensive products (MRP > ₹500) with minimal discount
Ranked top 5 categories offering highest average discounts
Calculated price per gram to identify value-for-money products
Grouped products based on weight into Low, Medium, and Bulk categories
Measured total inventory weight per product category
Clone the repository
git clone https://github.com/amlanmohanty/zepto-SQL-data-analysis-project.git cd zepto-SQL-data-analysis-project Open zepto_SQL_data_analysis.sql
Table creation
Data exploration
Data cleaning
Load the dataset into pgAdmin or any other PostgreSQL client
Create a database and run the SQL file
Import the dataset (convert to UTF-8 if necessary)
👤 Buragapalli Kamachari
📧 Email: [email protected]
🌐 LinkedIn: https://www.linkedin.com/in/buragapalli-kamachari-931624269/
📂 GitHub: github.com/kamachari