0% found this document useful (0 votes)
35 views17 pages

Project Stage I Report

The document describes a project to develop a machine learning model for predicting loan defaults. The objectives are to minimize risk of default and classify borrowers. The methodology involves data cleaning, exploratory analysis, and building models like KNN, Random Forest and XGBoost. Key performance metrics are accuracy, recall, precision and F1 score. Feature selection and handling imbalanced data are techniques to improve efficiency. The project timeline outlines phases from topic selection to deployment over 7 months.

Uploaded by

ravenharley1863
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views17 pages

Project Stage I Report

The document describes a project to develop a machine learning model for predicting loan defaults. The objectives are to minimize risk of default and classify borrowers. The methodology involves data cleaning, exploratory analysis, and building models like KNN, Random Forest and XGBoost. Key performance metrics are accuracy, recall, precision and F1 score. Feature selection and handling imbalanced data are techniques to improve efficiency. The project timeline outlines phases from topic selection to deployment over 7 months.

Uploaded by

ravenharley1863
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

K. K.

WAGH INSTITUTE OF ENGINEERING EDUCATION & RESEARCH Click icon to add picture

LOAN DEFAULTER PREDICTION USING


SUPERVISED MACHINE LEARNING
ALGORITHMS

Group Id 02
Internal Guide: Prof. Smita Patil
TEAM MEMBERS
Division. Name of the Email ID Contact Number
/ student
Roll No.
05 Rajshree Thakare [email protected] 9763914017

06 Manasi Barge [email protected] 8805539779

07 Sanket Padwal [email protected] 9404688630

08 Pratik Desale [email protected] 7588801537


Problem Definition

 To develop a model for Loan Defaulter Prediction. However, there are some customers behave
negatively after their application are approved. To prevent this situation, banks have to find some
methods to predict customers’ behaviours using Machine learning algorithms.
Objectives

 Objective
 To minimize the risk of borrowers defaulting the loans using created model.
 Create predicative model to classify each borrower as defaulter or not using the
data collected when the loan has been given. Determining probability of user
liability.
 Creating an interactive UI that will take users input and return an output

 Scope
 The goal of this project is to build a machine learning model that can predict if a
person will default on the loan based on the loan and personal information
provided.
 The model is intended to be used as a reference tool for the client and his financial
institution to help make decisions on issuing loans, so that the risk can be lowered,
and the profit can be maximized.
Literature Review

Research Paper Year Author Content


Loan prediction by using machine 2020 Supriya P, Pavani M, They started their analysis with data
learning models Saisushma N cleaning pre-processing, missing value
imputation, then exploratory data analysis,
and finally model building and evaluation.

Implementation of decision tree 2019 Amin R K, Indwiarti and The maximum precision value achieved
using C4.5 algorithm in decision Sibaroni Zhou, was 78.08% with data partition of 90:10 and
making of loan application by debtor the biggest recall value was 86% with data
partition of 80:20.
An exploratory data analysis for loan 2018 Sumathi V P and Sri J S They classify and examine the nature of
prediction based on nature of the loan applicants andconcluded that most
clients loan applicants preferred short-term loans.

Credit risk analysis and prediction 2017 G. Sudhamathy Banks hold huge volumes of customer
modelling of bank loans behaviour related data from which they are
unable to arrive at a judgement if an
applicant can be defaulter or not.
Requirement Specification

 Functional requirements

 The system should be able to build Users profile and maintain the record.

 The system will predict a users performance on the basis of the previous record.

 On the basis of previous record the system should be able to notify about the users, that user
is good or bad in that particular Loan Facility.

 Determining probability of user liability.


Requirement Specification

 Non-functional requirements
 Availability
 The system gives advice or alerts user immediately
 The system gives accurate results
 Interactive, minimal delays, safe information transmission

 Reliability
 Predictability
 Accuracy
 Usability
 Interoperability
 Efficiency
Methodology

 Data Cleaning and Pre-processing


 Take Dataset as a input.
 Give Training Dataset and Testing Dataset.
 Data Preprocessing step contains data cleaning process.

 Exploratory Data Analysis


 Finding meaningful patterns
 Statistical measured.

 Model building contain algorithm work


 KNN
 Random Forest
 XGBoost
Algorithms

 KKN
 The KNN algorithm is used for both classification and regression problems. How- ever, the KNN is
more widely used in classification problems in the industry and thus will be used in doing
classification and predictive analysis in this paper. The KNN is a simple algorithm that stores all
available cases and classifies new cases by a majority vote of its k neighbors.

 Random Forest
 This is a tree based ensemble model which helps in improving the accuracy of the model . It
combines a large number of Decision trees to build a powerful predicting model. It takes a random
sample of rows and features of each individual tree to prepare a decision tree model. Final prediction
class is either the mode of all the predictors or the mean of all the predictors.

 XGBooost
 This algorithm only works with the quantitative variable. It is a gradient boosting algorithm which
forms strong rules for the model by boosting weak learners to a strong learner. It is a fast and
efficient algorithm which recently dominated machine learning because of its high performance
and speed.
Detailed Design
Experimental Setup / Simulation

 Expectations
 To achieve a F1 score of training around 90% and F1 score of testing around 85-90%.

 Datasets Used
 The dataset we used is derived from the Kaggle.
 It contains more than 115,000 original loan data of users with 102 attributes.
 Training - 50%, Testing – 25%, Validation – 25%.
Experimental Setup / Simulation

 Operating System: - Windows 7/8/10


 Application Server :- Apache Tomcat 7/8/9
 Front End :- HTML, CSS
 Database : -Mysql
 Programming Language :- Python
 Processor : Intel i3/i5/i7
 Hard Disk :- 5 GB
 Memory:- 1GB RAM
Performance Parameters

 Confusion Matrix

• Accuracy
• Accuracy is defined as the ratio of the number of samples correctly classified by the classifier to the total
number of samples for a given test data set.
Performance Parameters

 F1-score
 F1-score, also called a balanced F Score, is defined as the balanced average of Precision and
recall.

 Recall

 Precision
Efficiency Issues

 Recursive Feature Elimination method to select 30 features with the strongest


correlation with the target variable, and eliminated the features step by step to achieve
the first dimensionality reduction, with the independent variable reduced .

 The target variable ‘loans status’ has a large difference in the number of normal and
default categories, which will cause trouble to model learning.
Project Planning (5)
7/22/2021 9/10/2021 10/30/2021 12/19/2021 2/7/2022 3/29/2022

Topic Searching and Paper Finding

Project Topic Approval

Problem Defination

Literature Review

Collecting Datasets

Understanding Required Technique

Implementation

Testing Dataset

Integration with Framework

Testing and Deployment


THANK YOU !!

You might also like