Skip to content

Data-driven SQL project analyzing Zepto’s order, customer, and revenue data to uncover key business insights and performance trends.

Notifications You must be signed in to change notification settings

Kamachari/Zepto_SQL_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zepto-SQL-Data-Analysis-Project

Complete Data Analyst Portfolio Project with end-to-end SQL Data Analysis of Zepto E-commerce Inventory data using PostgreSQL.

This is a complete, real-world data analyst portfolio project based on an e-commerce inventory dataset scraped from Zepto — one of India’s fastest-growing quick-commerce startups. This project simulates real analyst workflows, from raw data exploration to business-focused data analysis.

This project is perfect for:

📊 Data Analyst aspirants who want to build a strong Portfolio Project for interviews and LinkedIn 📚 Anyone learning SQL hands-on 💼 Preparing for interviews in retail, e-commerce, or product analytics

📌 Project Overview

The goal is to simulate how actual data analysts in the e-commerce or retail industries work behind the scenes to use SQL to:

✅ Set up a messy, real-world e-commerce inventory database

✅ Perform Exploratory Data Analysis (EDA) to explore product categories, availability, and pricing inconsistencies

✅ Implement Data Cleaning to handle null values, remove invalid entries, and convert pricing from paise to rupees

✅ Write business-driven SQL queries to derive insights around pricing, inventory, stock availability, revenue and more

📁 Dataset Overview

The dataset was sourced from Kaggle and was originally scraped from Zepto’s official product listings. It mimics what you’d typically encounter in a real-world e-commerce inventory system.

Each row represents a unique SKU (Stock Keeping Unit) for a product. Duplicate product names exist because the same product may appear multiple times in different package sizes, weights, discounts, or categories to improve visibility – exactly how real catalog data looks.

🧾 Columns:

sku_id: Unique identifier for each product entry (Synthetic Primary Key)

name: Product name as it appears on the app

category: Product category like Fruits, Snacks, Beverages, etc.

mrp: Maximum Retail Price (originally in paise, converted to ₹)

discountPercent: Discount applied on MRP

discountedSellingPrice: Final price after discount (also converted to ₹)

availableQuantity: Units available in inventory

weightInGms: Product weight in grams

outOfStock: Boolean flag indicating stock availability

quantity: Number of units per package (mixed with grams for loose produce)

🔧 Project Workflow

Here’s a step-by-step breakdown of what we do in this project:

1. Database & Table Creation

We start by creating a SQL table with appropriate data types:

CREATE TABLE zepto ( sku_id SERIAL PRIMARY KEY, category VARCHAR(120), name VARCHAR(150) NOT NULL, mrp NUMERIC(8,2), discountPercent NUMERIC(5,2), availableQuantity INTEGER, discountedSellingPrice NUMERIC(8,2), weightInGms INTEGER, outOfStock BOOLEAN, quantity INTEGER );

2. Data Import

Loaded CSV using pgAdmin's import feature.

If you're not able to use the import feature, write this code instead:

\copy zepto(category,name,mrp,discountPercent,availableQuantity, discountedSellingPrice,weightInGms,outOfStock,quantity) FROM 'data/zepto_v2.csv' WITH (FORMAT csv, HEADER true, DELIMITER ',', QUOTE '"', ENCODING 'UTF8'); Faced encoding issues (UTF-8 error), which were fixed by saving the CSV file using CSV UTF-8 format.

3. 🔍 Data Exploration

Counted the total number of records in the dataset

Viewed a sample of the dataset to understand structure and content

Checked for null values across all columns

Identified distinct product categories available in the dataset

Compared in-stock vs out-of-stock product counts

Detected products present multiple times, representing different SKUs

4. 🧹 Data Cleaning

Identified and removed rows where MRP or discounted selling price was zero

Converted mrp and discountedSellingPrice from paise to rupees for consistency and readability

5. 📊 Business Insights

Found top 10 best-value products based on discount percentage

Identified high-MRP products that are currently out of stock

Estimated potential revenue for each product category

Filtered expensive products (MRP > ₹500) with minimal discount

Ranked top 5 categories offering highest average discounts

Calculated price per gram to identify value-for-money products

Grouped products based on weight into Low, Medium, and Bulk categories

Measured total inventory weight per product category

🛠️ How to Use This Project

Clone the repository

git clone https://github.com/amlanmohanty/zepto-SQL-data-analysis-project.git cd zepto-SQL-data-analysis-project Open zepto_SQL_data_analysis.sql

This file contains:

Table creation

Data exploration

Data cleaning

SQL Business analysis

Load the dataset into pgAdmin or any other PostgreSQL client

Create a database and run the SQL file

Import the dataset (convert to UTF-8 if necessary)

🔗 Connect

👤 Buragapalli Kamachari
📧 Email: [email protected] 🌐 LinkedIn: https://www.linkedin.com/in/buragapalli-kamachari-931624269/
📂 GitHub: github.com/kamachari

About

Data-driven SQL project analyzing Zepto’s order, customer, and revenue data to uncover key business insights and performance trends.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published