#HR Attrition Analysis — Predicting & Preventing Employee Turnover
#Business Problem
"Why are employees leaving? Can we predict attrition early and target retention efforts?"
- Convert to binary classification:
Attrition = Yes/No - Three models for robustness & insight:
- Logistic Regression: Interpretable odds ratios (HR-friendly)
- Random Forest: Captures non-linear patterns (e.g.,
JobSatisfaction × Overtime) - XGBoost + SHAP: State-of-the-art prediction + explainable AI
- Model performance:
- Random Forest AUC: 0.811 → Strong predictive power
- Logistic Regression: 0.805 → strong predictive power
- XGBoost AUC: 0.779 → Slightly lower but more interpretable via SHAP
- Top 3 global drivers of attrition (SHAP):
OverTime— Highest impact (mean |SHAP| = 0.62)MonthlyIncome— Second highest (0.44)StockOptionLevel— Third (0.38)
- Actionable insight: Employees working overtime with low income or stock options are at highest risk.
- Windows 10/11
- Anaconda 2023.09 or later (includes Python 3.11.5)
→ Download Anaconda (64-bit)
→ During install: Check “Add to PATH” and “Register Anaconda”
- Clone this repo
Open Anaconda Prompt (search in Start menu) and run:git clone https://github.com/albertogoga/hr-attrition-analysis.git cd hr-attrition-analysis