Customer Behavior Analysis project using Python and SQL to analyze customer demographics, engagement, product reviews, and purchase journeys for valuable business insights. This repository contains a mini project for analyzing customer behavior using Python and SQL. The project involves loading customer data, performing sentiment analysis, visualizing trends, and querying a database to extract insights about customer actions, product reviews, and campaign effectiveness.
Table of Contents Overview
Features
Technologies Used
Setup Instructions
Python Code
SQL Queries
Database Schema
Visualization
License
Overview The Customer Behavior Analysis project aims to understand customer behavior patterns by analyzing datasets such as customer demographics, engagement data, product reviews, and purchase journeys. The project uses Python for data processing and visualization, and SQL for querying structured data stored in a MySQL database.
Features Python Features: Load and preprocess customer data.
Identify missing values.
Perform sentiment analysis on customer reviews using TextBlob.
Visualize age distribution of customers who made purchases.
SQL Features: Identify stages where customers drop off during their journey.
Analyze the performance of ad campaigns.
Detect products with poor ratings.
Find customers with the highest number of purchases.
Technologies Used Python: Data processing, sentiment analysis, and visualization.
Libraries: pandas, numpy, matplotlib, TextBlob
MySQL: Database for storing and querying structured data.
SQL: Writing queries to analyze customer behavior.
Setup Instructions Prerequisites Install Python (version >= 3.8).
Install MySQL (version >= 8.0).
Install required Python libraries:
bash pip install pandas numpy matplotlib textblob mysql-connector-python Steps Clone the repository:
bash git clone https://github.com/yourusername/customer-behavior-analysis.git cd customer-behavior-analysis Set up the MySQL database using the provided schema in database_schema.sql.
Place the CSV files (Customers.csv, Customer_journey.csv, engagement_data.csv, Customer_reviews.csv) in the project directory.
Run the Python script (analysis.py) to analyze data and visualize trends.
Python Code Data Loading and Preprocessing python import pandas as pd import numpy as np
try: customers = pd.read_csv('Customers.csv') journey = pd.read_csv('Customer_journey.csv') engagement = pd.read_csv('engagement_data.csv') reviews = pd.read_csv('Customer_reviews.csv')
# Preview data structure
print(customers.head())
print(journey.head())
print(engagement.head())
print(reviews.head())
except FileNotFoundError as e: print(f"Error: {e}. Please check if the file exists in the specified path.") Sentiment Analysis on Reviews python from textblob import TextBlob
reviews['Sentiment'] = reviews['ReviewText'].apply(lambda x: TextBlob(x).sentiment.polarity)
negative_reviews = reviews[reviews['Sentiment'] < 0] print(negative_reviews[['ProductID', 'ReviewText']].head()) Visualization python import matplotlib.pyplot as plt
purchases = journey[journey['Action'] == 'Purchase'] merged_data = pd.merge(purchases, customers, on='CustomerID')
plt.hist(merged_data['Age'], bins=10) plt.xlabel('Age') plt.ylabel('Number of Purchases') plt.title('Age Distribution of Purchasers') plt.show() SQL Queries
- Where are customers leaving? sql SELECT Stage, COUNT(*) AS Visitors, SUM(CASE WHEN Action = 'Purchase' THEN 1 ELSE 0 END) AS Purchases FROM CustomerJourney GROUP BY Stage;
- Which ads are making money? sql SELECT e.CampaignID, SUM(e.ViewsClicksCombined) AS Engagement, COUNT(j.JourneyID) AS Purchases FROM EngagementData e LEFT JOIN CustomerJourney j ON e.UserID = j.CustomerID GROUP BY e.CampaignID;
- What products do people hate? sql SELECT p.ProductID, p.ProductName, AVG(r.Rating) AS AvgRating, COUNT(r.ReviewID) AS ReviewCount FROM SportsProducts p JOIN ReviewData r ON p.ProductID = r.ProductID GROUP BY p.ProductID HAVING AvgRating < 3.5;
- Find customers with the most purchases: sql SELECT CustomerID, COUNT(*) AS TotalPurchases FROM CustomerJourney WHERE Action = 'Purchase' GROUP BY CustomerID ORDER BY TotalPurchases DESC; Database Schema The database consists of the following tables:
SportsProducts: Stores product details like ID, name, category, and price.
GeographyData: Contains location data (country and city).
EngagementData: Tracks user engagement with campaigns.
ReviewData: Stores product reviews with ratings and text.
CustomerJourney: Tracks customer actions across different stages.
Refer to the database_schema.sql file for detailed table structures.
Visualization The project includes visualizations such as:
Histogram showing the age distribution of customers who made purchases.
Insights from sentiment analysis on customer reviews.
License This project is licensed under the MIT License - see the LICENSE file for details.
Feel free to contribute or raise issues if you encounter any problems! 😊