This project is a Lead Scoring Case Study, built as part of the UpGrad Data Science course, to help businesses identify high-converting leads using Machine Learning models.
- Understand lead conversion behavior based on given features
- Clean & preprocess data to handle missing values, outliers
- Feature Engineering to extract useful insights
- Build ML models to predict lead conversion probability
- Evaluate models and deploy the best-performing one
- Programming Language: Python
- Data Handling: Pandas, NumPy
- Machine Learning: Scikit-Learn, Logistic Regression
- Visualization: Matplotlib, Seaborn
- The overall lead conversion so far has been 47.1%. The analysis suggests that leads tagged with high-intent indicators (like 'Closed by Horizzon' or 'Interested in Next batch') are more likely to convert.
- Channels like Websites and References have led to the most conversions (>90%), while social media and search engines like google came up far second (60-70%).
- Developing countries with lesser technology adaptibility such as Bahrain and Bangladesh dominate the successful conversions demographics, while Working Professionals in the field of Management also carry the same weightage.
- Successful conversions have spent an average of 1.5x time of the unsuccessful conversions on the website. Moreover, a low bounce rate (6.42%) indicates that once anyone enters the website, they are more likely to explore it beyond the first few webpages.
- A lead is more likely to be converted when contacted either via email or by phone. The data also indicates that prospects prefer email communication over calls.
- A positive correlation could be observed as we moved towards a higher activity index for a prospect, but it was the opposite when it came to profile index.
- This classification model can be deployed to help the business accurately predict outcomes (e.g., lead conversion) with 96% accuracy.
- The high precision and recall for both classes ensure that the model minimizes both false positives (incorrectly predicting a lead would convert) and false negatives (missing out on actual lead conversions).
- A high ROC AUC score indicates the model can strongly differentiate between both classes, further enhancing its performance.