Loan-Status-prediction

Problem Statment

Goal: Predicting whether the bank customer is eligible for loan sanction or not. Loan status predicted by using machine learning classification algorithms. After analyzing different models finally DecisionTreeClassifier fitted to solve this problem. Loan status prediction app is created using streamlit python and deployed.

Approach

Cleaning Credit score column

Credit score column contain 4 digit numbers also, after domain knowledge experty and google search we learnt that credit score should be three digit number only.
We observe that all four digit numbers ends with 0's so we simply drop the ending zero in four digit numbers
Credit score before cleaning

-Credit score after cleaning

Missing value treatment using statistical tests

For numerical features we checked that whther mean of numerical features is same for both categories of targte(Loan status) if there is significant difference we replace the missing values with mean value of numerical features of corresponding category
Here we used two sample ztest

Ho: credit score for fully paid <= credit score for charged off
Ha: credit score for fully paid > credit score for charged off

def cred_score_imputation(cols):
    loan_status=cols[1]
    cred_score=cols[0]
    if pd.isnull(cred_score):
        if loan_status=='Fully Paid':
            return ful_paid_cred_score.median()
        if loan_status=='Charged Off':
            return charged_off_cred_score.median()
    else:
        return cred_score


df['Credit Score']=df[['Credit Score','Loan Status']].agg(cred_score_imputation,axis=1)

Balancing the data

The target column(Loan status) was not balance , we used SMOTE technique to balance the frequency of categories in targte column
Credit score before cleaning
Before SMOTE
After SMOTE

model selection and workflow

We tried with each and every model to fit the data and to get bets accuarcy , and selected the best performer among all

Feature selection

PermutationImportance technique used to select the best features, as the result 21 features are selected, using these fetures agin we build the model

final model

XgBoost Classifier is the final model, which yield 85% accuracy with only 21 features among(41) features

model performance

Confussion matrix

-Classification report

Model deployment

Model deployed in Google cloud platform using streamlit
the supporting files are

a) Dockerfile

FROM python:3.9
WORKDIR /app
COPY requirements.txt ./requirements.txt

RUN pip3 install -r requirements.txt
EXPOSE 8080
COPY . /app

CMD streamlit run --server.port 8080 --server.enableCORS false app.py

b) app.yaml

runtime: custom
env: flex

Streamlit app

This app contain following section

1)Home: In this section you will find the description of problem statment and source for dataset

2)EDA: This section agin divided into
A)Descriptive: Here description such as value counts, shape of the data...etc are described
B) Plots: Here Important plots used in EDA are mentioned

3)ML: In this section you can predict the loan status by entering the required features in the given fields

App link: https://loan-status-prediction-326315.el.r.appspot.com/

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Dataset		Dataset
EDA_and_model_building		EDA_and_model_building
Streamlit_app_building		Streamlit_app_building
final_report		final_report
images		images
model_deployment		model_deployment
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Loan-Status-prediction

Problem Statment

Approach

Cleaning Credit score column

Missing value treatment using statistical tests

Balancing the data

model selection and workflow

Feature selection

final model

model performance

Model deployment

Streamlit app

About

Uh oh!

Releases

Packages

Languages

Basavaraj100/Loan-Status-prediction

Folders and files

Latest commit

History

Repository files navigation

Loan-Status-prediction

Problem Statment

Approach

Cleaning Credit score column

Missing value treatment using statistical tests

Balancing the data

model selection and workflow

Feature selection

final model

model performance

Model deployment

Streamlit app

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages