Loan-Prediction Using Machine Learning
Loan-Prediction Using Machine Learning
Machine Learning
By
K. Vikramaditya Reddy
Mtech ACS
194609
Content
Introduction
The classification problem
Steps involved in machine learning
Features
Labels
Visualizing data using Google Colab
Explanation of the Code using Google Colab
Models of training and testing the dataset
1. Loan prediction using logistic regression
2. Loan prediction using random forest classification
3. Loan prediction using decision tree classification
Loan Prediction models Comparison
INTRODUCTION
Loan-Prediction
Understanding the problem statement is the first and foremost step.
This would help you give an intuition of what you will face ahead of
time. Let us see the problem statement.
Dream Housing Finance company deals in all home loans. They have
presence across all urban, semi urban and rural areas. Customer first
apply for home loan after that company validates the customer
eligibility for loan. Company wants to automate the loan eligibility
process (real time) based on customer detail provided while filling
online application form. These details are Gender, Marital Status,
Education, Number of Dependents, Income, Loan Amount, Credit
History and others. To automate this process, they have given a
problem to identify the customers segments, those are eligible for loan
amount so that they can specifically target these customers.
The Classification problem
It is a classification problem where we have to predict whether a
loan would be approved or not. In a classification problem, we
have to predict discrete values based on a given set of
independent variable(s). Classification can be of two types:
Binary Classification : In this classification we have to predict
either of the two given classes. For example: classifying the
gender as male or female, predicting the result as win or loss, etc.
Multiclass Classification : Here we have to classify the data into
three or more classes. For example: classifying a movie's genre as
comedy, action or romantic, classify fruits as oranges, apples, or
pears, etc.
Loan prediction is a very common real-life problem that each retail
bank faces atleast once in its lifetime. If done correctly, it can save
a lot of man hours at the end of a retail bank.
Steps involved in machine learning
1 - Data Collection
The quantity & quality of your data dictate how accurate our model is
The outcome of this step is generally a representation of data (Guo
simplifies to specifying a table) which we will use for training
Using pre-collected data, by way of datasets from Kaggle, UCI, etc.,
still fits into this step
2 - Data Preparation
Wrangle data and prepare it for training
Clean that which may require it (remove duplicates, correct errors,
deal with missing values, normalization, data type conversions, etc.)
Randomize data, which erases the effects of the particular order in
which we collected and/or otherwise prepared our data.
Steps involved in machine learning
3 - Choose a Model
Different algorithms are for different tasks; choose the right
one
4 - Train the Model
The goal of training is to answer a question or make a
prediction correctly as often as possible
Linear regression example: algorithm would need to learn
values for m (or W) and b (x is input, y is output)
Each iteration of process is a training step
Steps involved in machine learning
6 - Parameter Tuning
This step refers to hyper-parameter tuning, which is an "art form" as
opposed to a science
Tune model parameters for improved performance
Simple model hyper-parameters may include: number of training
steps, learning rate, initialization values and distribution, etc.
7 - Make Predictions
Using further (test set) data which have, until this point, been
withheld from the model (and for which class labels are known), are
used to test the model; a better approximation of how the model will
perform in the real world.
DATASETS
LP00100 Graduat
Male No 0 No 5849 0.0 NaN 360.0 1.0 Urban Y
2 e
LP00100 Graduat
Male Yes 1 No 4583 1508.0 128.0 360.0 1.0 Rural N
3 e
LP00100 Graduat
Male Yes 0 Yes 3000 0.0 66.0 360.0 1.0 Urban Y
5 e
Not
LP00100
Male Yes 0 Graduat No 2583 2358.0 120.0 360.0 1.0 Urban Y
6
e
LP00100 Graduat
Male No 0 No 6000 0.0 141.0 360.0 1.0 Urban Y
8 e
Loan prediction using Logistic Regression
• # take a look at the top 5 rows of the test set, notice the absense of "Loa
n_Status" that we will predict
• test.head()
LP00101
Male Yes 0 Graduate No 5720 0 110.0 360.0 1.0 Urban
5
LP00102
Male Yes 1 Graduate No 3076 1500 126.0 360.0 1.0 Urban
2
LP00103
Male Yes 2 Graduate No 5000 1800 208.0 360.0 1.0 Urban
1
LP00103
Male Yes 2 Graduate No 2340 2546 100.0 360.0 NaN Urban
5
LP00105 Not
Male No 0 No 3276 0 78.0 360.0 1.0 Urban
1 Graduate
Loan prediction using Logistic Regression
# Printing values of whether loan is accepted or rejected
y_pred [:100]
Loan prediction using Logistic Regression
Confusion Matrix
Loan prediction using Logistic Regression
# Check Accuracy
from sklearn.metrics import accuracy_score
accuracy_score(y_test,y_pred)
0.8373983739837398
0.8024081632653062
Loan prediction using random forest classification
Confusion matrix
Loan prediction using random forest classification
# Check Accuracy
from sklearn.metrics import accuracy_score
accuracy_score(y_test,y_pred)
0.6910569105691057
# Applying k-Fold Cross Validation
from sklearn.model_selection import cross_val_score
accuracies.mean()
# accuracies.std()
Loan Prediction using Decision Tree
Classification
# Printing values of whether loan is accepted or rejected
y_pred[:100]
Loan Prediction using Decision Tree Classification
Confusion Matrix
Loan Prediction using Decision Tree
Classification
# Check Accuracy
from sklearn.metrics import accuracy_score
accuracy_score(y_test,y_pred)
0.8292682926829268
# Applying k-Fold Cross Validation
from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10)
accuracies.mean()
# accuracies.std()
0.7922448979591836
Loan prediction models comparison
Loan Prediction Accuracy Accuracy using K-fold
Cross Validation
This means that from the above accuracy table, we can conclude that logistic regression
is best model for the loan prediction problem.
THANK YOU