Credit_Risk_Analysis

Overview of the analysis

The purpose of the analysis is to use supervised machine learning techniques to solve a real-world challenge : credit card risk assessment.

Credit risk is an inherently unbalanced classification problem, as good loans easily outnumber risky loans. Using the credit card dataset from LendingClub, a peer-to-peer lending services company, different techniques will be used to train and evaluate the model with unbalanced classes.

First, the data will be oversampled using 'Random Over Sampler' and 'SMOTE' algorithms, then undersample the data using 'Cluster Centroids' algorithm. Then a combination of over-and-under sampling will be used with the 'SMOTEENN' algorithm. Finally, 'Balanced Random Forest Classifier' and 'Easy Ensemble Classifier' is used to reduce bias and predict the credit risk. Then the performance of these 6 models will be compared to determine which model is better suited for credit risk analysis.

Results

1. Random Over Sampler

Balanced accuracy score : 0.64
Precision : 0.01 for high_risk and 1.00 for low_risk
Recall score : 0.69 for high_risk and 0.59 for low_risk
F1 score : 0.02 for high_risk and 0.74 for low_risk

From the low precision and F1 scores for high_risk category, it can be concluded that this model is not good to predict high_risk credit assessment.

2. SMOTE (Synthetic Minority Oversampling Technique)

Balanced accuracy score : 0.66
Precision : 0.01 for high_risk and 1.00 for low_risk
Recall score : 0.63 for high_risk and 0.69 for low_risk
F1 score : 0.02 for high_risk and 0.82 for low_risk

From the low precision and F1 scores for high_risk category, it can be concluded that this model is not good to predict high_risk credit assessment.

3. Cluster Centroids

Balanced accuracy score : 0.54
Precision : 0.01 for high_risk and 1.00 for low_risk
Recall score : 0.69 for high_risk and 0.40 for low_risk
F1 score : 0.01 for high_risk and 0.57 for low_risk

From the low precision and F1 scores for high_risk category, it can be concluded that this model is not good to predict high_risk credit assessment.

4. SMOTEENN

Balanced accuracy score : 0.64
Precision : 0.01 for high_risk and 1.00 for low_risk
Recall score : 0.71 for high_risk and 0.57 for low_risk
F1 score : 0.02 for high_risk and 0.72 for low_risk

Although the accuracy score and F1 score for high_risk are slightly higher in this model, yet the precision and F1 scores for high_risk category is quite low. Hence it can be concluded that this model is not good to predict high_risk credit assessment.

5. Balanced Random Forest Classifier

Balanced accuracy score : 0.79
Precision : 0.03 for high_risk and 1.00 for low_risk
Recall score : 0.70 for high_risk and 0.87 for low_risk
F1 score : 0.06 for high_risk and 0.93 for low_risk

The accuracy score is higher in this model, as well as the precision and F1 scores for high_risk category are higher than the other models. Even so, as the F1 score is so low, it can be concluded that this model is not good to predict high_risk credit assessment.

6. Easy Ensemble AdaBoost Classifier

Balanced accuracy score : 0.93
Precision : 0.09 for high_risk and 1.00 for low_risk
Recall score : 0.92 for high_risk and 0.94 for low_risk
F1 score : 0.16 for high_risk and 0.97 for low_risk

The accuracy score is much higher in this model (0.93). The precision for high_risk category is 0.16 and the recall is 0.92. These numbers combined with the F1 score of 0.16, it can be concluded that this model is a good fit amongst all others to predict high_risk credit assessment.

Summary

Looking at the F1 score, the precision and recall score as well as the accurancy for all 6 models, it can be concluded that the 'Easy Ensemble AdaBoost Classifier' is the best fit to predict high_risk credit assessments. Although the F1 score is not too high, the accuracy score is 0.93 which indicates that this model can correctly categorize the risk at 93%. The precision for high_risk is quite low (0.03) but the recall is 0.92 for high_risk which is highest amongst all the 6 models. This indicates that the model can correctly categorize high_risk at 92%. Hence the recommendation is to use the 'Easy Ensemble AdaBoost Classifier model.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Images		Images
.gitignore		.gitignore
README.md		README.md
credit_risk_ensemble.ipynb		credit_risk_ensemble.ipynb
credit_risk_resampling.ipynb		credit_risk_resampling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit_Risk_Analysis

Overview of the analysis

Results

Summary

About

Uh oh!

Releases

Packages

Languages

ParnaKundu/Credit_Risk_Analysis

Folders and files

Latest commit

History

Repository files navigation

Credit_Risk_Analysis

Overview of the analysis

Results

Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages