0% found this document useful (0 votes)
49 views26 pages

1 PPPP

This case study analyzes loan application data from a bank to identify key factors that influence loan default. The analysis includes [1] cleaning the data by removing unnecessary columns and outliers, [2] identifying missing data and data imbalance, and [3] performing univariate, bivariate, and segmented univariate analysis. Key findings are that individuals with lower incomes, younger ages, and less work experience are more likely to default, as are those living in lower rated areas or with more family members. The top 10 predictors of default from the correlation analysis include income type, family size, age, employment duration, loan amounts, and external factors.

Uploaded by

hedator300
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views26 pages

1 PPPP

This case study analyzes loan application data from a bank to identify key factors that influence loan default. The analysis includes [1] cleaning the data by removing unnecessary columns and outliers, [2] identifying missing data and data imbalance, and [3] performing univariate, bivariate, and segmented univariate analysis. Key findings are that individuals with lower incomes, younger ages, and less work experience are more likely to default, as are those living in lower rated areas or with more family members. The top 10 predictors of default from the correlation analysis include income type, family size, age, employment duration, loan amounts, and external factors.

Uploaded by

hedator300
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Bank Loan

Case Study
(Final Project – 2)
BY HARSHA YADAV
Project Description:

• This case study attempts to demonstrate the application of EDA in a real-world


business environment. In this case study, in addition to using the techniques
learned in the EDA module, it will help in gaining a basic grasp of risk analytics in
banking and financial services, as well as how data is utilized to reduce the risk of
losing money when lending to consumers
• The company wants to understand the driving factors (or driver variables) behind
loan default, i.e. the variables which are strong indicators of default.
APPROACH
❑ This case study has two enormous data sets: the current application and the previous application. Each included several
unneeded columns that would be useless for risk assessments, as well as many blank data. So, first step is cleaning the
data.
❑ To evaluate his enormous set of data, I first cleaned the data, located some outliers and deleted them, and then began
performing univariate and bivariate analysis using pivot tables and charts.

TECH-STACK USED
Software And The Version Used While Making The Project :
1. MS Excel (For working, analysing and reporting insights)
2. Microsoft Power Point (For presenting the detailed analysis)
Data Understanding:

1.`application_data.csv` contains all the information of the client at the time of


application.
The data is about wheather a client has payment difficulties.
2.`previous_application.csv` contains information about the client’s previous
loan data. It contains the data whether the previous application had been
Approved, Cancelled, Refused or Unused offer.
3.`columns_descrption.csv` is data dictionary which describes the meaning of
the variables.
Task 1 : Present the overall approach of the analysis. Mention the
problem statement and the analysis approach briefly

Both the CSV files will be checked for any unnecessary data and
unwanted columns/rows, and will be cleaned/removed if necessary.
Then they will be checked for outliers, if any, to find if there is
skewness in the given columns which would affect the final
visualization and insight. Data Imbalance will be checked. Different
types of analysis will be done to understand the relationships
between different variable to find the Driving Factors. Different
visualizations will be observed to understand the relationships
AFTER CLEANING THE TABLES
Task 2 : Identify the missing data and use appropriate method to deal
with it. (Remove columns/or replace it with an appropriate value)
In Applicant_data.csv
Before Cleaning, the number of Columns and rows are 122 and 3075124 respectively.

Items removed from the original dataset are :

* There are columns having more than 40% null data.


* There are more than 50 unwanted columns or columns not desirable for our analysis.
(Hint: Note that in EDA, since it is not necessary to replace the missing value, but if you have to replace
the missing value, what should be the approach. Clearly mention the approach.)
*There are columns with null values less than 40%. They can be treated in 2 ways. I can delete those
columns but then I might lose some important information required for my analysis. I can retain it but
then I will have to do treatment. If I impute them, I will introduce bias. The decision to delete or retain
basically depends on the Understanding of the problem statement, the usefulness of the variable, total
size of available data. Here it seems that those columns can be removed So, I have removed them.
There are still some columns will very little missing values which will be treated if necessary or left as it
is.
Task 3: Identify if there are outliers in the dataset. Also, mention why do
you think it is an outlier. Again, remember that for this exercise, it is not
necessary to remove any data points.
Task 4 : Identify if there is data imbalance in the data. Find the
ratio of data imbalance.
Task 5 : Explain the results of univariate, segmented univariate, bivariate
analysis, etc. in business terms.

UNIVARIATE ANALYSIS :
• Individuals with higher incomes are less likely to apply for loans.
• The credit amount of a bank loan is typically in the range of 45000 to 1045000.
• The majority of loan applications have come from people between the ages of 35 and 50.
• Those with 0 to 8 years of work experience are the most likely to seek for loans.
• Individuals who own homes are more likely to apply for loans than others.
• Those who are married have taken out more loans.
• More loans have been requested by working people.
• Unaccompanied minors have requested for extra loans.
SEGMENTED UNIVARIATE ANALYSIS
BIVARIATE ANALYSIS :
• Customers who live in low-rating areas will have higher defaults.
• Individuals with lower incomes are more likely to default.
• Young people are more likely to default, and the trend of defaulters
declines with age.
• Ladies are less inclined than males to have defaults.
• More defaults are predicted due to maternity leave and unemployment.
• Customers with more than five family members are more likely to default
on their bank loan.
• Customers with fewer educational qualifications are more likely to fail on a
bank loan.
• Customers with hardly work experience are more likely to have defaults.
Task 6 : Find the top 10 correlation for the Client with
payment difficulties and all other cases (Target variable).
Top 10 driving factors in current application.csv

1. Income type
2. Count of Family Members
3. Children count
4. External source
5. Region rating of client
6. Age
7. Months Employed
8. Amount credit
9. Amount Goods Price
10. Amount total income
Insights
• NAME_EDUCATION_TYPE: Academic degree has less defaults.
• NAME_INCOME_TYPE: Student and Businessmen have no defaults.
• REGION_RATING_CLIENT: RATING 1 is safer.
• ORGANIZATION_TYPE: Clients with Trade Type 4 and 5 and Industry type 8 have defaulted less than 3%.
• DAYS_BIRTH: People above age of 50 have low probability of defaulting
• DAYS_EMPLOYED: Clients with 40+ year experience having less than 1% default rate.
• AMT_INCOME_TOTAL: Applicant with Income more than 700,000 are less likely to default.
• NAME_CASH_LOAN_PURPOSE: Loans bought for Hobby, buying garage are being repaid mostly.
• CNT_CHILDREN: People with zero to two children tend to repay the loans.
• CODE_GENDER: Men are at relatively higher default rate
• NAME_FAMILY_STATUS: People who have civil marriage or who are single default a lot.
• NAME_EDUCATION_TYPE: People with Lower Secondary & Secondary education
• NAME_INCOME_TYPE: Clients who are either at Maternity leave OR Unemployed default a lot.
• REGION_RATING_CLIENT: People who live in Rating 3 has highest defaults.
• OCCUPATION_TYPE: Avoid Low-skill Laborers, Drivers and Waiters/barmen staff, Security staff, Laborers and Cooking staff as their default rate is huge
Result
• After performing the analysis, we can rectify whether a client will
repay the loan or not.
• The people who are likely to face problem in loan repayment are
labourers.
• People with Secondary /secondary special education might face
problem in loan repayment.
• Moreover, those who are living in house/apartment are facing
difficulty in loan repayment (may be because of extra home loan,
EMIs and so on).
***End of report***

You might also like