A machine learning project comparing Logistic Regression, Discriminant Analysis, and Neural Networks to predict loan approvals using financial datasets. Achieved up to 95.7% validation accuracy, showcasing data-driven, fair, and efficient decision-making in lending.
This project explores how machine learning can improve the loan approval decision-making process by replacing subjective, time-consuming evaluations with data-driven predictive modeling.
By analyzing financial datasets and comparing Logistic Regression, Discriminant Analysis, and Neural Networks, this project identifies the most effective model for accurate loan approval predictions.
The final results demonstrate how neural networks can significantly enhance both accuracy and fairness in lending decisions.
Sheetal Patangay
School of Business Administration, The Pennsylvania State University
Course: BUS 510 โ Business Analytics and Decision Modeling
Instructor: Dr. Dinesh R. Pai
- Source: Kaggle (L&T Financial Services Loan Dataset)
- Records: 4,269 financial entries
- Features: 13 columns including
income_annum,loan_amount,loan_term,cibil_score, and asset values. - Target:
loan_statusโ Approved (1) or Not Approved (0) - Problem Type: Binary classification
- Dropped irrelevant column:
loan_id - Encoded categorical variables:
education: Graduated โ 1 | Not Graduated โ 0self_employed: Yes โ 1 | No โ 0loan_status: Approved โ 1 | Not Approved โ 0
- Performed data stratification (70% training / 30% testing split)
- Verified dataset completeness โ no missing or NaN values
-
Logistic Regression
- Captures the probability of loan approval.
- Achieved 91.6% training accuracy and 91.2% validation accuracy.
- Model effectively predicts approval likelihood based on financial variables.
-
Discriminant Analysis
- Classifies observations into โApprovedโ or โRejectedโ based on linear discriminants.
- Achieved 91.7% training accuracy and 91.8% validation accuracy.
- Strong interpretability for categorical decision-making.
-
Neural Network Analysis
- Multi-layer architecture designed to recognize complex relationships.
- Achieved 99.5% training accuracy and 95.75% validation accuracy.
- Outperformed other models in predictive precision and recall.
| Model | Training Accuracy | Validation Accuracy |
|---|---|---|
| Logistic Regression | 91.6% | 91.2% |
| Discriminant Analysis | 91.7% | 91.8% |
| Neural Network | 99.5% | 95.75% |
โ Neural Network emerged as the most accurate and robust model for loan approval prediction.
- CIBIL Score Dominance: The credit score proved to be the most critical predictor of loan approval.
- IncomeโLoan Balance: The ratio of annual income to loan amount significantly influences approval likelihood.
- Luxury Assets: Presence of luxury assets contributed more positively than expected.
- Self-Employment Nuances: Stability of self-employment mattered more than status alone.
- Education Factor: Graduates showed consistently higher approval rates.
- Collateral Importance: Higher residential and commercial asset values strongly correlated with approval.
- Null (Hโ): No difference in mean values of numeric features between approved and rejected loans.
- Alternative (Hโ): There is a significant difference.
โ Result: Rejected the null hypothesis โ strong statistical evidence that feature means differ between the two loan status classes.
- Neural networks can transform financial decision-making by learning from patterns traditional models might miss.
- Data preprocessing and feature encoding have a direct impact on model performance.
- Model interpretability remains essential for trust and transparency in lending.
- Combining technical models with domain understanding ensures more equitable and efficient loan approvals.
- Integrate real-time financial data for live prediction dashboards.
- Explore Deep Learning architectures (e.g., LSTM, CNN).
- Add Explainable AI (XAI) components for interpretability.
- Deploy a Flask or Streamlit web app for user interaction.
| Category | Tools Used |
|---|---|
| Data Processing | Excel, Python, IBM SPSS |
| Modeling | Neural Networks, Logistic Regression, Discriminant Analysis |
| Visualization | Tableau, Matplotlib |
| Source | Kaggle |
| Platform | Jupyter Notebook / Excel Neural Network Add-In |
This project demonstrates how machine learning can optimize loan approval workflows, reduce manual bias, and improve financial inclusion.
By combining analytical precision with ethical responsibility, predictive models like these pave the way for data-driven, transparent, and scalable decision-making in finance.