0% found this document useful (0 votes)
75 views12 pages

Trainity Project-6

The document outlines a case study on analyzing loan default factors using Excel for data analysis. Key tasks include identifying missing data, detecting outliers, analyzing data imbalance, performing variate analysis, and identifying correlations among variables. The findings highlight significant issues such as missing values, class imbalance, and strong correlations that can inform decision-making in loan approvals.

Uploaded by

Sarthak Bajaj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views12 pages

Trainity Project-6

The document outlines a case study on analyzing loan default factors using Excel for data analysis. Key tasks include identifying missing data, detecting outliers, analyzing data imbalance, performing variate analysis, and identifying correlations among variables. The findings highlight significant issues such as missing values, class imbalance, and strong correlations that can inform decision-making in loan approvals.

Uploaded by

Sarthak Bajaj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Project 6

Bank Loan
Case Study
Tushar Aseri | [email protected]
Descriptio
Contents Data Analytics Task
Insights
Summary
Description
As I am a Data Analyst at finance company, I need to draw meaningful insights from the dataset for
identifying factors behind loan default

Task Use Exploratory Data Analysis (EDA) to analyze patterns in the data and
ensure that capable applicants are not rejected.

Data Analysis
Identifying Missing Dat
Identifying Outlier
Analyze Data Imbalanc
Performing Variate Analysi
Identify Top Correlations
Data Analytics
Use various functions in excel to find blank cells in
Problem 1 Identifying Missing Data
data and impute them accordingly

Steps

First I used TRANSPOSE to list all column

name

Used COUNTBLANK for counting blank

cells in each colum

Then I categorized column for

imputation (blank values of column

having text values are removed


Some blanks are removed
Finally used IF and ISBLANK function
& some replaced

Code-
Data Analytics
Detect and identify various outliers in the dataset
Problem 2 Identifying Outliers
using statistical functions

Steps
First I used TRANSPOSE to list all column name
Used QUARTILE.INC for finding quartile each for 25
and 75 and calculated IQ
Then I calculated lower & upper bound
Then I used Conditional formatting to identify
outliers and highlight them

Codes-
Data Analytics
Detect and calculate data imbalance ratio of
Problem 3 Analyzing Data Imbalance
loan application dataset
Codes-
Steps
First I used TRANSPOSE to list all column
name Total count
Then I used the formula for calculating
count of all zeros and ones for each of
the colum Class of 0
Then I calculated the ratio of 0 and
Finally used conditional formatting

Class of 1
Data Analytics
Analyze different variates using multiple
Problem 4 Perform Variate Analysis
excel functions
Codes-
Steps
First I used TRANSPOSE to list all column
name
Then by using each formula I calculated
values for each colum
Compared with the target variable and
created chart for it

Data Analytics
Identify the top correlations for each segment
Problem 5 Identifying Correlations
using excel functions
Steps
First I used TRANSPOSE to list all column name
Then I used filtered out 2 segments of target column i.e 1 and
Then I used CORREL function to find relation between each Codes- Ranking
variabl
Finally used RANK function to assign them values and you can see
the heatmap in the excel sheet that is attached Correlation
Insights

There are lot of missing values


in the several column
Some column have binary
values which kind of distorts
the data make them
imbalanc
Some rows are better to be
deleted becuase they have
text in it and some column with
numerics are replaced with
average or median depending
upon the circumstances
Insights
Insights

Used IFERROR function for #N/


A, #VALUE!, and #DIV/0! errors
to keep formulas clean
Used the INDIRECT function to
dynamically reference column
names from another sheet
Applied MATCH and INDEX to
retrieve specific column data
based on selected headers
Made a heatmap by using
conditional formattin
Highlighted columns with high
missing data ratios for possible
exclusion or attention.
Summary
Identified and handled missing data using Excel functions like ISBLANK, COUNTBLANK, and
filled gaps using averages or medians
Detected outliers using the IQR method and flagged them for review with conditional
formatting
Analyzed the distribution of the target variable and found significant class imbalance
Created pie chart to visualize the imbalance of target variable
Performed univariate analysis to explore the distribution of key variable
Compared variable distributions for defaulters and non-defaulters to spot major
differences
Used the CORREL function to identify variables strongly correlated with the target variable
Summarized outlier thresholds (Q1, Q3, IQR, lower & upper bounds) for multiple columns in
a structured format
Organized all insights visually and statistically to support decision-making and highlight
key risk factors.

You might also like