1 PPPP

This case study analyzes loan application data from a bank to identify key factors that influence loan default. The analysis includes [1] cleaning the data by removing unnecessary columns and outliers, [2] identifying missing data and data imbalance, and [3] performing univariate, bivariate, and segmented univariate analysis. Key findings are that individuals with lower incomes, younger ages, and less work experience are more likely to default, as are those living in lower rated areas or with more family members. The top 10 predictors of default from the correlation analysis include income type, family size, age, employment duration, loan amounts, and external factors.

Uploaded by

hedator300

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views26 pages

1 PPPP

Uploaded by

hedator300

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Bank Loan

Case Study
(Final Project – 2)
BY HARSHA YADAV
Project Description:

• This case study attempts to demonstrate the application of EDA in a real-world

business environment. In this case study, in addition to using the techniques
learned in the EDA module, it will help in gaining a basic grasp of risk analytics in
banking and financial services, as well as how data is utilized to reduce the risk of
losing money when lending to consumers
• The company wants to understand the driving factors (or driver variables) behind
loan default, i.e. the variables which are strong indicators of default.
APPROACH
❑ This case study has two enormous data sets: the current application and the previous application. Each included several
unneeded columns that would be useless for risk assessments, as well as many blank data. So, first step is cleaning the
data.
❑ To evaluate his enormous set of data, I first cleaned the data, located some outliers and deleted them, and then began
performing univariate and bivariate analysis using pivot tables and charts.

TECH-STACK USED
Software And The Version Used While Making The Project :
1. MS Excel (For working, analysing and reporting insights)
2. Microsoft Power Point (For presenting the detailed analysis)
Data Understanding:

1.`application_data.csv` contains all the information of the client at the time of

application.
The data is about wheather a client has payment difficulties.
2.`previous_application.csv` contains information about the client’s previous
loan data. It contains the data whether the previous application had been
Approved, Cancelled, Refused or Unused offer.
3.`columns_descrption.csv` is data dictionary which describes the meaning of
the variables.
Task 1 : Present the overall approach of the analysis. Mention the
problem statement and the analysis approach briefly

Both the CSV files will be checked for any unnecessary data and
unwanted columns/rows, and will be cleaned/removed if necessary.
Then they will be checked for outliers, if any, to find if there is
skewness in the given columns which would affect the final
visualization and insight. Data Imbalance will be checked. Different
types of analysis will be done to understand the relationships
between different variable to find the Driving Factors. Different
visualizations will be observed to understand the relationships
AFTER CLEANING THE TABLES
Task 2 : Identify the missing data and use appropriate method to deal
with it. (Remove columns/or replace it with an appropriate value)
In Applicant_data.csv
Before Cleaning, the number of Columns and rows are 122 and 3075124 respectively.

Items removed from the original dataset are :

* There are columns having more than 40% null data.

* There are more than 50 unwanted columns or columns not desirable for our analysis.
(Hint: Note that in EDA, since it is not necessary to replace the missing value, but if you have to replace
the missing value, what should be the approach. Clearly mention the approach.)
*There are columns with null values less than 40%. They can be treated in 2 ways. I can delete those
columns but then I might lose some important information required for my analysis. I can retain it but
then I will have to do treatment. If I impute them, I will introduce bias. The decision to delete or retain
basically depends on the Understanding of the problem statement, the usefulness of the variable, total
size of available data. Here it seems that those columns can be removed So, I have removed them.
There are still some columns will very little missing values which will be treated if necessary or left as it
is.
Task 3: Identify if there are outliers in the dataset. Also, mention why do
you think it is an outlier. Again, remember that for this exercise, it is not
necessary to remove any data points.
Task 4 : Identify if there is data imbalance in the data. Find the
ratio of data imbalance.
Task 5 : Explain the results of univariate, segmented univariate, bivariate
analysis, etc. in business terms.

UNIVARIATE ANALYSIS :
• Individuals with higher incomes are less likely to apply for loans.
• The credit amount of a bank loan is typically in the range of 45000 to 1045000.
• The majority of loan applications have come from people between the ages of 35 and 50.
• Those with 0 to 8 years of work experience are the most likely to seek for loans.
• Individuals who own homes are more likely to apply for loans than others.
• Those who are married have taken out more loans.
• More loans have been requested by working people.
• Unaccompanied minors have requested for extra loans.
SEGMENTED UNIVARIATE ANALYSIS
BIVARIATE ANALYSIS :
• Customers who live in low-rating areas will have higher defaults.
• Individuals with lower incomes are more likely to default.
• Young people are more likely to default, and the trend of defaulters
declines with age.
• Ladies are less inclined than males to have defaults.
• More defaults are predicted due to maternity leave and unemployment.
• Customers with more than five family members are more likely to default
on their bank loan.
• Customers with fewer educational qualifications are more likely to fail on a
bank loan.
• Customers with hardly work experience are more likely to have defaults.
Task 6 : Find the top 10 correlation for the Client with
payment difficulties and all other cases (Target variable).
Top 10 driving factors in current application.csv

1. Income type
2. Count of Family Members
3. Children count
4. External source
5. Region rating of client
6. Age
7. Months Employed
8. Amount credit
9. Amount Goods Price
10. Amount total income
Insights
• NAME_EDUCATION_TYPE: Academic degree has less defaults.
• NAME_INCOME_TYPE: Student and Businessmen have no defaults.
• REGION_RATING_CLIENT: RATING 1 is safer.
• ORGANIZATION_TYPE: Clients with Trade Type 4 and 5 and Industry type 8 have defaulted less than 3%.
• DAYS_BIRTH: People above age of 50 have low probability of defaulting
• DAYS_EMPLOYED: Clients with 40+ year experience having less than 1% default rate.
• AMT_INCOME_TOTAL: Applicant with Income more than 700,000 are less likely to default.
• NAME_CASH_LOAN_PURPOSE: Loans bought for Hobby, buying garage are being repaid mostly.
• CNT_CHILDREN: People with zero to two children tend to repay the loans.
• CODE_GENDER: Men are at relatively higher default rate
• NAME_FAMILY_STATUS: People who have civil marriage or who are single default a lot.
• NAME_EDUCATION_TYPE: People with Lower Secondary & Secondary education
• NAME_INCOME_TYPE: Clients who are either at Maternity leave OR Unemployed default a lot.
• REGION_RATING_CLIENT: People who live in Rating 3 has highest defaults.
• OCCUPATION_TYPE: Avoid Low-skill Laborers, Drivers and Waiters/barmen staff, Security staff, Laborers and Cooking staff as their default rate is huge
Result
• After performing the analysis, we can rectify whether a client will
repay the loan or not.
• The people who are likely to face problem in loan repayment are
labourers.
• People with Secondary /secondary special education might face
problem in loan repayment.
• Moreover, those who are living in house/apartment are facing
difficulty in loan repayment (may be because of extra home loan,
EMIs and so on).
***End of report***

9TH SSC Trigonometry Paper
100% (2)
9TH SSC Trigonometry Paper
2 pages
EDA Credit Case Study (Karan Pratap Singh)
100% (1)
EDA Credit Case Study (Karan Pratap Singh)
63 pages
EDA Loan Case Study PPT - Ver 1.1
80% (5)
EDA Loan Case Study PPT - Ver 1.1
22 pages
Credit EDA Case Study
100% (3)
Credit EDA Case Study
22 pages
EDA Assignment
100% (1)
EDA Assignment
19 pages
Capstone Project PPT
No ratings yet
Capstone Project PPT
13 pages
Why The Hammered Bracelet Could Not Be Flown Over
No ratings yet
Why The Hammered Bracelet Could Not Be Flown Over
21 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
34 pages
Ass 06 - Bank Loan Case Study
No ratings yet
Ass 06 - Bank Loan Case Study
11 pages
Trainity Data Analytics Training Project 6
No ratings yet
Trainity Data Analytics Training Project 6
22 pages
Credit EDA Assignment PDF
No ratings yet
Credit EDA Assignment PDF
40 pages
Bank Loan Case Study1
No ratings yet
Bank Loan Case Study1
13 pages
Problem Statement
No ratings yet
Problem Statement
11 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
2 pages
EDA Assignment
No ratings yet
EDA Assignment
33 pages
Credit EDA Case Study Problem Statement
No ratings yet
Credit EDA Case Study Problem Statement
4 pages
Bank Loan Case Study Report
No ratings yet
Bank Loan Case Study Report
23 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
21 pages
Trainity-Data An
No ratings yet
Trainity-Data An
24 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
21 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
22 pages
EDA Case Study
No ratings yet
EDA Case Study
94 pages
Bank Loan PPT
No ratings yet
Bank Loan PPT
45 pages
Credit EDA Case Study Doc 1
100% (1)
Credit EDA Case Study Doc 1
16 pages
Credit EDA Case Study
No ratings yet
Credit EDA Case Study
42 pages
Bank Loan Case Study 2
No ratings yet
Bank Loan Case Study 2
23 pages
LendingClubCaseStudy 1
No ratings yet
LendingClubCaseStudy 1
19 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
41 pages
Bank Loan Casestudy
No ratings yet
Bank Loan Casestudy
17 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
26 pages
Lending Club Case Study: Prabhat Sharma Brij Bhushan Paliwal
No ratings yet
Lending Club Case Study: Prabhat Sharma Brij Bhushan Paliwal
10 pages
EDA Case Study
No ratings yet
EDA Case Study
13 pages
Eda Case Study Final PDF
100% (1)
Eda Case Study Final PDF
15 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
13 pages
EDA Group Case Study
No ratings yet
EDA Group Case Study
33 pages
EDA Assignment Summary PDF
No ratings yet
EDA Assignment Summary PDF
12 pages
Bank Loan PDF
No ratings yet
Bank Loan PDF
30 pages
Vechile Loan Defaulter
No ratings yet
Vechile Loan Defaulter
23 pages
Credit EDA Case Study
No ratings yet
Credit EDA Case Study
19 pages
Fradulent Credit Case Study
100% (1)
Fradulent Credit Case Study
31 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
11 pages
Credit EDA Assignment
No ratings yet
Credit EDA Assignment
23 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
26 pages
Group 5 Dseb64a Report
No ratings yet
Group 5 Dseb64a Report
10 pages
Credit EDA Case Study
100% (3)
Credit EDA Case Study
16 pages
Explatory Data Analysis
No ratings yet
Explatory Data Analysis
18 pages
Data Mining Case Study PDF
100% (1)
Data Mining Case Study PDF
21 pages
Data Mining Case Study PDF
No ratings yet
Data Mining Case Study PDF
21 pages
EDA Credit Assignment Shakti - PDF
No ratings yet
EDA Credit Assignment Shakti - PDF
51 pages
Summary and Context
No ratings yet
Summary and Context
51 pages
Eda - Assignment 1 (Final)
No ratings yet
Eda - Assignment 1 (Final)
10 pages
6 - Bank Loan Analysis
No ratings yet
6 - Bank Loan Analysis
10 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
43 pages
Credit Card EDA: Authored by
100% (1)
Credit Card EDA: Authored by
16 pages
Spark Python Course APPLY Project Problem Statement
No ratings yet
Spark Python Course APPLY Project Problem Statement
3 pages
Capstone Project
100% (1)
Capstone Project
7 pages
Credit Eda Case Study
100% (2)
Credit Eda Case Study
17 pages
Lending Club Case Study
No ratings yet
Lending Club Case Study
30 pages
Capstone Project - Final Submission
No ratings yet
Capstone Project - Final Submission
36 pages
Hillier 7e Ch02 PPT Accessible
No ratings yet
Hillier 7e Ch02 PPT Accessible
74 pages
Crack the Data Analyst Interview: Real-Time Questions & Expert Answers
From Everand
Crack the Data Analyst Interview: Real-Time Questions & Expert Answers
Yash d.
No ratings yet
Statistical Analysis and Decision Making Using Microsoft Excel
From Everand
Statistical Analysis and Decision Making Using Microsoft Excel
Grace Edmar Elizar del Prado
5/5 (1)
Important Notes For Project 6
No ratings yet
Important Notes For Project 6
9 pages
Project 7 Dataset
No ratings yet
Project 7 Dataset
478 pages
Operation Analytics and Investigating Metric Spike
No ratings yet
Operation Analytics and Investigating Metric Spike
28 pages
Operation Analytics and Investigating Metric Spike
No ratings yet
Operation Analytics and Investigating Metric Spike
20 pages
Ict2611 Octnov24
No ratings yet
Ict2611 Octnov24
15 pages
Assignment - 2 (Google in China)
100% (1)
Assignment - 2 (Google in China)
5 pages
Nozzle First
No ratings yet
Nozzle First
21 pages
Safe Work Procedure
No ratings yet
Safe Work Procedure
2 pages
Maribel - r92 - El Chico Que Detesto
No ratings yet
Maribel - r92 - El Chico Que Detesto
443 pages
FKF Rules and Regulations Final
No ratings yet
FKF Rules and Regulations Final
29 pages
Briandavidphillips - Core Skills Hypnosis DVD Course
No ratings yet
Briandavidphillips - Core Skills Hypnosis DVD Course
6 pages
Machine Design, Vol.4 (2012) No.2, ISSN 1821-1259 Pp. 103-106
No ratings yet
Machine Design, Vol.4 (2012) No.2, ISSN 1821-1259 Pp. 103-106
4 pages
Lesson 1 (Week 1 (Area of Trapezium and Triangle)
No ratings yet
Lesson 1 (Week 1 (Area of Trapezium and Triangle)
3 pages
Data Sheet - Carrier Chiller
No ratings yet
Data Sheet - Carrier Chiller
4 pages
American Manufacturing Aw1122bcd Parts Book
100% (1)
American Manufacturing Aw1122bcd Parts Book
6 pages
Punzalan, Joshua Mitchell L. Case-Scenarios-NICU
No ratings yet
Punzalan, Joshua Mitchell L. Case-Scenarios-NICU
2 pages
Withdrawn: Will Sell by Public Auction
No ratings yet
Withdrawn: Will Sell by Public Auction
1 page
Separation by Drying
No ratings yet
Separation by Drying
30 pages
0812 0819BL
No ratings yet
0812 0819BL
15 pages
Ficha Técnica de Balatas-001 Noviembre 2011
No ratings yet
Ficha Técnica de Balatas-001 Noviembre 2011
4 pages
Grammar Worksheets
No ratings yet
Grammar Worksheets
30 pages
Slidesgo Unlocking The Future The Impact of Ai and Machine Learning Technology 20241129165854RC3Y
No ratings yet
Slidesgo Unlocking The Future The Impact of Ai and Machine Learning Technology 20241129165854RC3Y
11 pages
Activity 3 Earths Interior
No ratings yet
Activity 3 Earths Interior
3 pages
CHỨC NĂNG GIAO TIẾP
No ratings yet
CHỨC NĂNG GIAO TIẾP
10 pages
Fall of Dhaka
100% (4)
Fall of Dhaka
4 pages
Solution Manual For Canadian PR For The Real World Maryse Cardin Kylie Mcmullan
No ratings yet
Solution Manual For Canadian PR For The Real World Maryse Cardin Kylie Mcmullan
6 pages
Form 60
No ratings yet
Form 60
1 page
Insurance Awareness Handouts - Basics of Insurance
No ratings yet
Insurance Awareness Handouts - Basics of Insurance
8 pages
FT-14D Digital Flexitest™ Switch
No ratings yet
FT-14D Digital Flexitest™ Switch
4 pages
BHS Inggris Xi Sem-1 TP 2021-2022
No ratings yet
BHS Inggris Xi Sem-1 TP 2021-2022
8 pages
Syltherm HF Tds
No ratings yet
Syltherm HF Tds
2 pages
Adult Christian Education: A Training of Kingdom Workers
No ratings yet
Adult Christian Education: A Training of Kingdom Workers
9 pages

1 PPPP

Uploaded by

1 PPPP

Uploaded by

Bank Loan

• This case study attempts to demonstrate the application of EDA in a real-world

1.`application_data.csv` contains all the information of the client at the time of

Items removed from the original dataset are :

* There are columns having more than 40% null data.

You might also like