Project Stage I Report
Project Stage I Report
WAGH INSTITUTE OF ENGINEERING EDUCATION & RESEARCH Click icon to add picture
Group Id 02
Internal Guide: Prof. Smita Patil
TEAM MEMBERS
Division. Name of the Email ID Contact Number
/ student
Roll No.
05 Rajshree Thakare [email protected] 9763914017
To develop a model for Loan Defaulter Prediction. However, there are some customers behave
negatively after their application are approved. To prevent this situation, banks have to find some
methods to predict customers’ behaviours using Machine learning algorithms.
Objectives
Objective
To minimize the risk of borrowers defaulting the loans using created model.
Create predicative model to classify each borrower as defaulter or not using the
data collected when the loan has been given. Determining probability of user
liability.
Creating an interactive UI that will take users input and return an output
Scope
The goal of this project is to build a machine learning model that can predict if a
person will default on the loan based on the loan and personal information
provided.
The model is intended to be used as a reference tool for the client and his financial
institution to help make decisions on issuing loans, so that the risk can be lowered,
and the profit can be maximized.
Literature Review
Implementation of decision tree 2019 Amin R K, Indwiarti and The maximum precision value achieved
using C4.5 algorithm in decision Sibaroni Zhou, was 78.08% with data partition of 90:10 and
making of loan application by debtor the biggest recall value was 86% with data
partition of 80:20.
An exploratory data analysis for loan 2018 Sumathi V P and Sri J S They classify and examine the nature of
prediction based on nature of the loan applicants andconcluded that most
clients loan applicants preferred short-term loans.
Credit risk analysis and prediction 2017 G. Sudhamathy Banks hold huge volumes of customer
modelling of bank loans behaviour related data from which they are
unable to arrive at a judgement if an
applicant can be defaulter or not.
Requirement Specification
Functional requirements
The system should be able to build Users profile and maintain the record.
The system will predict a users performance on the basis of the previous record.
On the basis of previous record the system should be able to notify about the users, that user
is good or bad in that particular Loan Facility.
Non-functional requirements
Availability
The system gives advice or alerts user immediately
The system gives accurate results
Interactive, minimal delays, safe information transmission
Reliability
Predictability
Accuracy
Usability
Interoperability
Efficiency
Methodology
KKN
The KNN algorithm is used for both classification and regression problems. How- ever, the KNN is
more widely used in classification problems in the industry and thus will be used in doing
classification and predictive analysis in this paper. The KNN is a simple algorithm that stores all
available cases and classifies new cases by a majority vote of its k neighbors.
Random Forest
This is a tree based ensemble model which helps in improving the accuracy of the model . It
combines a large number of Decision trees to build a powerful predicting model. It takes a random
sample of rows and features of each individual tree to prepare a decision tree model. Final prediction
class is either the mode of all the predictors or the mean of all the predictors.
XGBooost
This algorithm only works with the quantitative variable. It is a gradient boosting algorithm which
forms strong rules for the model by boosting weak learners to a strong learner. It is a fast and
efficient algorithm which recently dominated machine learning because of its high performance
and speed.
Detailed Design
Experimental Setup / Simulation
Expectations
To achieve a F1 score of training around 90% and F1 score of testing around 85-90%.
Datasets Used
The dataset we used is derived from the Kaggle.
It contains more than 115,000 original loan data of users with 102 attributes.
Training - 50%, Testing – 25%, Validation – 25%.
Experimental Setup / Simulation
Confusion Matrix
• Accuracy
• Accuracy is defined as the ratio of the number of samples correctly classified by the classifier to the total
number of samples for a given test data set.
Performance Parameters
F1-score
F1-score, also called a balanced F Score, is defined as the balanced average of Precision and
recall.
Recall
Precision
Efficiency Issues
The target variable ‘loans status’ has a large difference in the number of normal and
default categories, which will cause trouble to model learning.
Project Planning (5)
7/22/2021 9/10/2021 10/30/2021 12/19/2021 2/7/2022 3/29/2022
Problem Defination
Literature Review
Collecting Datasets
Implementation
Testing Dataset