0% found this document useful (0 votes)
22 views19 pages

Home Credit Score Card Model

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views19 pages

Home Credit Score Card Model

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

HOME CREDIT SCORECARD

MODEL
By Fitria Dwi Wulandari
TABLE OF CONTENTS

PROBLEM MACHINE LEARNING


01 RESEARCH 04 MODEL

DATA BUSINESS
02 PREPROCESSING 05 RECOMMENDATION

BUSINESS
03 INSIGHTS
01
PROBLEM
RESEARCH
1
PROJECT BACKGROUND

Many people struggle to get loans due to insufficient or non-existent credit

histories. Home Credit strives to broaden financial inclusion for the unbanked

population by providing a positive and safe borrowing experience. In order to make

sure this underserved population has a positive loan experience. Home Credit makes

use of a variety of alternative data to predict their clients' repayment abilities. Doing

so will ensure that clients capable of repayment are not rejected and that loans are

given with a principal, maturity, and repayment calendar that will empower their

clients to be successful.

14
2
DATA SOURCE ACTIONS

The data used are application train and


1. Perform data cleaning, and
application test. There are our main table,
broken into two files for train (with TARGET) visualization for business

and test (without TARGET). insights.

2. Build a models with

machine learning

OBJECTIVE algorithms.

3. Provide recommendations

1. Identify characteristics of of potential clients for company to increase


who will have difficulty repaying loans and their clients succeed in
who will not. applying for loans.
2. Predict client’s repayment abilities.

3
02
DATA
PREPROCESSING
4
Data 122 307,511
Application Train Number of Columns Number of Rows

EDA DATA CLEANING MODEL BUILDING


Discover patterns, and
the structure of the
dataset Label Encoding
Raw Data Detecting Duplication
Transform non-numerical to numerical labels
Application No duplicate rows
Train Bivariate Feature Selection
Visualization Identify the top 20 best features to include in
Handling Missing Values the model
Visualization of the
relationship between 2 There are some columns that are
features dropped and the rest are imputed Handling Imbalanced Data
Re-sampling so that the data is balanced

Multivariate Detecting Outliers


Model Building
Visualization
There are some columns that have
Build models with multiple machine learning
Visualization of the outliers, but it was decided the
algorithms and compare which one is the best
relationship of more outlier will not be removed
than 2 features
Model Evaluation
Compare which one of the model is the best
5
Data 121 48,744
Application Test
Number of Columns Number of Rows

DATA CLEANING PREDICTION

Predict client’s repayment abilities with best


machine learning model obtained before
Raw Data
Detecting Duplication
Application
Test No duplicate rows

Handling Missing Values


There are some columns that are dropped
and the rest are imputed

Label Encoding
Transform non-numerical to numerical
labels

6
03
BUSINESS
INSIGHTS

7
● Most number of clients who
apply for loans are in the range
of 35-40 years.

● Meanwhile, the number of


applicants for clients aged <25
or age >65 is very low.

● Clients who have no payment


difficulties are clients in the
range of 35-45 years. You can
target these clients as your
priority.

● While clients who have


payment difficulties are client
the range of 25-35 years.

8
6
All student clients
have no difficulty
repaying the loans
whether with cash
loan or revolving
loan for a low to
medium credit
amount of the loan.

For the income type of maternity leave with cash loans, all the clients have problems repaying the loans for a
medium credit amount of the loan. While all clients with maternity leaves and revolving loans have no difficulty
repaying the loans.

For unemployed clients with cash loans, more than 50% of clients have problems repaying loans with medium
credit amounts of the loan. While all unemployed clients with revolving loans have no difficulty repaying the loan.
9
04
MACHINE
LEARNING
MODEL
10
MODEL COMPARISON
Training Accuracy Testing Accuracy Error ROC
Algorithm
Score Score Margin Score

Logistic Regression 67.16% 67.29% 0.13% 0.6728

Gaussian Naive Bayes 60.24% 60.39% 0.15% 0.604

Decision Tree 100% 83.9% 11.74% 0.8826

Random Forest 100% 99.65% 0.35% 0.9965

K-Nearest Neighbor 91.56% 88.07% 3.79% 0.8806

Neural Network 70.01% 69.48% 0.58% 0.6948

The prediction accuracy of the train and test data in Random Forest model has a value that is not much
different, it can be said that the model is very good, which is there is no underfitting or overfitting. So
the Random Forest model was chosen as the best model to predict client’s repayment abilities.

11
BEST MODEL Algorithm Random Forest Classifier

Random forest model gives


Performance
100% correct results

There is 0.35% error


margin

Score from external data


The 5 most source 2
important
features Score from external data
source 3

Client's age in days

Days ID publish

14
Days registration
12
05
BUSINESS
RECOMMENDATION
13
RECOMMENDATION
1. A client with an income type of student can be said to be a client who is capable of
repaying the loans whether with a cash loan or revolving loan (100% of applications
approved). But there only 0.005% of applications come from the student.

2. A client who works as an accountant can be said to be a client who is capable of


repaying the loans (95% of applications approved). But, there is only 3.19% of applications
come from an accountant. So do, the client who work as high skill tech staff and
manager, they are capable of repaying the loans, but there are only a few applications
that come from them


Create a campaign so that more student, accountant, high skill tech staff, manager
interested in applying for a loan

14
RECOMMENDATION
1. Clients with maternity leaves and cash loans can be said to be a client who is incapable
of repaying the loan (100% of applications rejected). On the contrary, all clients with
maternity leave but taking revolving loans to have their applications approved.

2. For unemployed clients, more than 50% of them have a problem repaying their loans if
they take cash loan contracts. Meanwhile, all unemployed client who takes revolving
loans is capable of repaying the loan.


Need further analysis, you can survey to find out if there is a problem if a client with
maternity leaves or unemployed takes a cash loans contract. So, in the future, if there are
clients with that type of income, you can recommend the right contract type so that their
applications will be approved

15
You can see the entire project
documentation here!

https://github.com/fitria-dwi/Home-Credit-Score-Card-Model

16
THANK
YOU

You might also like