This project is an end-to-end SQL-based data analysis using real-world e-commerce inventory data from Zepto.
The objective is to simulate real business scenarios by performing data cleaning, exploration, and business insight generation using SQL.
This project is ideal for:
- SQL Portfolio
- Data Analyst / Business Analyst roles
- Retail & E-commerce analytics case studies
Click to expand file structure
Zepto Inventory Analysis using SQL
βββ screenshots/
β βββ columns_section.png
β βββ covert_csv.png
β βββ general_section.png
β βββ importing-data-path.png
β βββ options_section.png
β βββ Q1.png
β βββ Q2.png
β βββ Q3.png
β βββ Q4.png
β βββ Q5.png
β βββ Q6.png
β βββ Q7.png
β βββ Q8.png
β
βββ zepto-inventory-dataset/
β βββ zepto_v1.xlsx
β βββ zepto_v2.csv
β
βββ queries-for-business-insights.sql
βββ README.md
- Source: Kaggle
https://www.kaggle.com/datasets/palvinder2006/zepto-inventory-dataset - Rows: 3,732
- Columns: 9
- Granularity: SKU-level (multiple entries per product with different weights/prices)
| Column Name | Description |
|---|---|
| sku_id | Stock Keeping unit ID |
| category | Product category |
| name | Product name |
| mrp | Maximum Retail Price |
| discount_percent | Discount percentage |
| discounted_selling_price | Final selling price |
| weight_in_gms | Product weight |
| available_quantity | Quantity available |
| out_of_stock | Stock status (Boolean) |
- Database: PostgreSQL
- Tool: pgAdmin
- Language: SQL
- Data Format: CSV (UTF-8)
drop table if exists zepto;
create table zepto(
sku_id serial primary key,
category varchar(120),
name varchar(150) not null,
mrp numeric(8,2),
discount_percent numeric(5,2),
avaliable_quantity integer,
discount_selling_price numeric (8,2),
weight_in_gram integer,
out_of_stock boolean,
quantity integer
);Then we need to import data from zepto-inventory-dataset\zepto_v2.csv
zepto-inventory-insight --> Schema (1) --> Table (1) --> zepto -- > right click --> import/Export data-
Converting CSV file
-
open the
zepto-inventory-dataset\zepto_v2.csvin excel and Save AsCSV UTF-8(comma delimited) (*csv)
-
-
General Section
-
Select the file from your divice
-
set
Encoding --> UTFP = 8
-
-
Options Section
-
Enable the Header option
-
-
Columns Section
-
Delete the sku_id column. Because we sku_id doesn't present in
zepto_v2.csv
-
-- Step 1 -> data exploration
-- count of rows
select count(*) from zepto;
-- sample data
select * from zepto
limit 10;
-- null values
select * from zepto
where
category is null
or
name is null
or
mrp is null
or
discount_percent is null
or
avaliable_quantity is null
or
discount_selling_price is null
or
weight_in_gram is null
or
out_of_stock is null
or
quantity is null; -- no null found
-- different producto category
select distinct category
from zepto
order by category; -- total 14
-- product in stock vs out of stock
select out_of_stock, count(sku_id)
from zepto
group by out_of_stock; -- instock = 453 , outofstoc = 3279
-- product name present multiple times
select name, count(sku_id) as "Number of SKU's"
from zepto
group by name
having count(sku_id) > 1
order by count(sku_id) desc;-Removed products with MRP = 0 -Converted prices from paise to rupees -Verified no NULL values -Standardized numeric columns -Checked stock inconsistencies
-- products with price = 0
select * from zepto
where mrp = 0 or discount_selling_price = 0; -- sku_id = 3607 which have price 0
delete from zepto
where mrp = 0; -- 1 deleted
-- convert paise to rupees from mrp column
update zepto
set mrp = mrp/100.0,
discount_selling_price = discount_selling_price/100.0;
select mrp, discount_selling_price from zepto;SELECT name, category, discount_percent
FROM zepto
ORDER BY discount_percent DESC
LIMIT 10;πΈ Output Screenshot:
SELECT name, mrp
FROM zepto
WHERE mrp > 300 AND out_of_stock = TRUE;πΈ Output Screenshot:
SELECT category,
SUM(discounted_selling_price * available_quantity) AS total_revenue
FROM zepto
GROUP BY category
ORDER BY total_revenue DESC;πΈ Output Screenshot:
SELECT name, mrp, discount_percent
FROM zepto
WHERE mrp > 500 AND discount_percent < 10
ORDER BY mrp DESC;πΈ Output Screenshot:
SELECT category,
ROUND(AVG(discount_percent), 2) AS avg_discount
FROM zepto
GROUP BY category
ORDER BY avg_discount DESC;πΈ Output Screenshot:
SELECT name,
discounted_selling_price / weight_in_gms AS price_per_gram
FROM zepto
WHERE weight_in_gms > 100
ORDER BY price_per_gram;πΈ Output Screenshot:
SELECT name,
CASE
WHEN weight_in_gms < 250 THEN 'Low'
WHEN weight_in_gms BETWEEN 250 AND 1000 THEN 'Medium'
ELSE 'Bulk'
END AS weight_category
FROM zepto;πΈ Output Screenshot:
SELECT category,
SUM(weight_in_gms * available_quantity) AS total_weight
FROM zepto
GROUP BY category
ORDER BY total_weight DESC;πΈ Output Screenshot:
This project demonstrates:
-
Real-world SQL usage
-
Business-oriented thinking
-
Data cleaning & transformation
-
Insight-driven decision making












