Skip to content

This project is a Lead Scoring Case Study, built as part of the UpGrad Data Science course, to help businesses identify high-converting leads using Logistic Regression Machine Learning model.

License

Notifications You must be signed in to change notification settings

yashj1301/lead-scoring

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Lead Scoring Case Study (Capstone Project)

📌 Project Overview

This project is a Lead Scoring Case Study, built as part of the UpGrad Data Science course, to help businesses identify high-converting leads using Machine Learning models.

🎯 Objectives

  1. Understand lead conversion behavior based on given features
  2. Clean & preprocess data to handle missing values, outliers
  3. Feature Engineering to extract useful insights
  4. Build ML models to predict lead conversion probability
  5. Evaluate models and deploy the best-performing one

🛠️ Tech Stack & Tools

  1. Programming Language: Python
  2. Data Handling: Pandas, NumPy
  3. Machine Learning: Scikit-Learn, Logistic Regression
  4. Visualization: Matplotlib, Seaborn

Exploratory Analysis Outcomes

  1. The overall lead conversion so far has been 47.1%. The analysis suggests that leads tagged with high-intent indicators (like 'Closed by Horizzon' or 'Interested in Next batch') are more likely to convert.
  2. Channels like Websites and References have led to the most conversions (>90%), while social media and search engines like google came up far second (60-70%).
  3. Developing countries with lesser technology adaptibility such as Bahrain and Bangladesh dominate the successful conversions demographics, while Working Professionals in the field of Management also carry the same weightage.
  4. Successful conversions have spent an average of 1.5x time of the unsuccessful conversions on the website. Moreover, a low bounce rate (6.42%) indicates that once anyone enters the website, they are more likely to explore it beyond the first few webpages.
  5. A lead is more likely to be converted when contacted either via email or by phone. The data also indicates that prospects prefer email communication over calls.
  6. A positive correlation could be observed as we moved towards a higher activity index for a prospect, but it was the opposite when it came to profile index.

Model Outcomes

  1. This classification model can be deployed to help the business accurately predict outcomes (e.g., lead conversion) with 96% accuracy.
  2. The high precision and recall for both classes ensure that the model minimizes both false positives (incorrectly predicting a lead would convert) and false negatives (missing out on actual lead conversions).
  3. A high ROC AUC score indicates the model can strongly differentiate between both classes, further enhancing its performance.

About

This project is a Lead Scoring Case Study, built as part of the UpGrad Data Science course, to help businesses identify high-converting leads using Logistic Regression Machine Learning model.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published