Course: DAMO-611-2 Data Analytics Case Study 3 (Spring 2025)
Instructor: Omid Isfahanialamdari
Submitted by:
- Heta Chavda (NF1014555)
- Yash Patel (NF1009944)
- Devarsh Oza (NF1003776)
- Joy Ajayi (NF1002698)
Banks face increasing challenges with loan defaults (non-performing loans), which affect profitability, compliance, and investor confidence.
This project leverages Logistic Regression to predict loan repayment outcomes (“Fully Paid” vs. “Charged Off”) using borrower and loan-specific features.
Key Goals:
- Reduce default risk via predictive analytics.
- Provide actionable recommendations for financial institutions.
- Build a transparent, interpretable model suitable for deployment.
- Classify loans as “Fully Paid” or “Charged Off” using borrower attributes.
- Quantify the impact of key variables:
- Interest Rate
- Annual Income
- Debt-to-Income (DTI) ratio
- Employment Length
- Support business decision-making with risk segmentation.
- Provide a framework for deployment & monitoring:contentReference[oaicite:0]{index=0}:contentReference[oaicite:1]{index=1}.
- Source: Internal lending dataset (
financial_loan.csv) - Rows: ~37,138 loan applications
- Target Variable:
loan_status→ Binarized into Paid (0) and Charged Off (1)
- Removed missing records in critical fields.
- Encoded
emp_length→ Numeric scale (0–10). - Scaled continuous variables (
int_rate,dti,income). - Created binary classification target.
- 84% fully repaid
- 14% charged off
- 3% current loans:contentReference[oaicite:2]{index=2}
- Longer tenure (10+ years) = highest repayment likelihood.
- Defaults remain consistent (~12–15%) across groups:contentReference[oaicite:3]{index=3}.
- Median DTI:
- Charged Off: ~14.5%
- Fully Paid: ~13%
- Higher DTI correlates with repayment challenges:contentReference[oaicite:4]{index=4}.
- Higher interest rates → higher default probability.
- Fully repaid loans cluster at lower interest brackets:contentReference[oaicite:5]{index=5}.
- Logistic Regression (chosen for interpretability & statistical rigor).
- Train/Test Split: 80/20
- Class Imbalance:
class_weight='balanced' - Metrics: ROC-AUC, Precision, Recall, F1, Confusion Matrix:contentReference[oaicite:6]{index=6}
- ROC-AUC Score: 0.9154 → Strong predictive performance:contentReference[oaicite:7]{index=7}
- Confusion Matrix Results:
- TP = 3922
- TN = 768
- FP = 477
- FN = 2261
- Interest Rate (–16.45): Strong negative driver (higher rates → higher defaults).
- DTI (–1.06): Higher ratios → more defaults.
- Employment Length (–0.03): Longer tenure slightly reduces default risk.
- Annual Income (+0.000003): Minimal positive effect:contentReference[oaicite:8]{index=8}.
- Risk Mitigation: Early identification of high-risk borrowers.
- Portfolio Optimization: Shift focus toward safer borrowers.
- Automation: Integration with digital workflows enables real-time credit scoring.
- Customer Targeting: Personalized lending terms by borrower risk segment:contentReference[oaicite:9]{index=9}.
-
Deploy Model in Pre-Loan Screening
- Integrate logistic regression scoring into loan origination platforms.
- Flag risky applicants with >70% default probability.
-
Retrain Quarterly
- Capture market shifts and borrower behavior changes.
-
Expand Feature Set
- Add credit bureau data, loan purpose, payment history.
-
Test Ensemble Models
- Random Forest, Gradient Boosting (XGBoost) for higher accuracy.
-
Governance & Compliance
- Ensure transparency, fairness checks, explainability (e.g., SHAP values).
- Align with Basel III/IV & Fair Lending regulations:contentReference[oaicite:10]{index=10}:contentReference[oaicite:11]{index=11}.
The Logistic Regression model demonstrated strong predictive power with ROC-AUC of 0.91, making it a reliable solution for loan risk prediction.
With ongoing retraining, feature expansion, and compliance oversight, the model can:
- Reduce defaults
- Improve profitability
- Strengthen trust with regulators and investors
- Chris Ekai (2023). Credit Risk Assessment: A Comprehensive Guide for Lenders and Financial Institutions. Risk Publishing.
- Atul K. Gupta (2024). Credit Risk Predictive Modeling. LinkedIn Pulse.