0% found this document useful (0 votes)
6 views

PBA Assignment

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

PBA Assignment

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

INTRODUCTION

Project Overview: Comprehensive Loan Data Analysis and Visualization


This project aims to provide an in-depth analysis of a loan dataset, offering insights into various
financial variables, loan approval, risk assessment, and lending decisions. By employing data
visualization techniques, we seek to uncover underlying patterns, trends, and correlations that
can inform credit risk assessments, risk management strategies, and lending strategies.

Dataset overview:
The dataset consists of various financial variables, such as loan amounts, interest rates, debt-
to-income (DTI) ratios, annual incomes, loan verification status, loan terms, and loan purposes.
The project's primary goal is to identify and understand the relationships between these
variables and how they impact loan approval, loan amounts, and risk assessment.

Problem statement:
The dataset represents information related to auto loans, encompassing borrower details, loan
characteristics, and payment information. The primary objective could be to analyze factors
influencing loan status, predict loan default risk, or assess the relationship between borrower
characteristics and loan terms. Such analysis can help in improving loan approval processes,
setting interest rates, and devising strategies for risk management.

Methodology:
Through the application of histograms, box plots, bar plots, scatter plots, pair plots, and
heatmaps, we will examine the distribution, correlation, and relationship between different
variables in the dataset. This visual approach will facilitate communication and understanding
of complex financial relationships, enabling better-informed decision-making for lenders and
financial institutions.

Rationale for Methodology: The chosen methodology allows for a comprehensive


examination of the dataset, highlighting essential patterns and trends. By visualizing the data,
we can identify outliers, correlations, and clusters that may not be immediately apparent in raw
numerical data. Moreover, the visual approach facilitates communication and understanding of
complex financial relationships, enabling better-informed decision-making for lenders and
financial institutions.

1
In the following sections, we will present the analysis, interpretation, and implications of the
data visualizations, providing insights into loan approval, risk assessment, and lending
strategies.

DATA COLLECTION
Data Collection for Comprehensive Loan Information Dataset

Overview
The Comprehensive Loan Information for Credit Risk dataset provides detailed information
about loan applicants and their associated loan characteristics. This dataset is sourced from
Kaggle and is specifically designed for credit risk assessment.

Dataset Details
Source: Kaggle Comprehensive Loan Information Dataset

Time Period: Data spans from 2007 to 2020Q3

Provider: LendingClub

Variables (Features)
Id: Unique identifier for each loan application.

Address State: State where the borrower resides.

Application Type: Indicates whether the application is an individual or joint application.

Employment Length: Length of employment (e.g., “5 years,” “10+ years”).

Employment Title: Borrower’s job title.

Home Ownership: Type of home ownership (e.g., “Own,” “Rent,” “Mortgage”).

Issue Date: Date when the loan was issued.

Last Credit Pull Date: Date of the last credit inquiry.

Last Payment Date: Date of the most recent loan payment.

Loan Status: Current status of the loan (e.g., “Current,” “Late,” “Fully Paid”).

Next Payment Date: Expected date of the next loan payment.

Member Id: Unique identifier for each borrower.

2
Sub-Grade: Loan sub-grade assigned by the lender.

Term: Loan term (e.g., “36 months,” “60 months”).

Verification Status: Indicates whether the borrower’s income was verified.

Annual Income: Borrower’s annual income.

Debt-to-Income Ratio (DTI): Ratio of debt payments to income.

Monthly Loan Payment Amount: Monthly payment amount for the loan.

Interest Rate of the Loan: Annual interest rate charged on the loan.

Loan Amount: Total loan amount requested.

Total Payment: Total amount paid by the borrower.

Significance
• The Comprehensive Loan Information dataset provides insights into borrowers’
financial profiles, loan characteristics, and repayment behavior.
• Researchers and financial institutions can use this data to develop credit risk models,
assess lending strategies, and improve decision-making.

This report outlines the data collection process for the Comprehensive Loan Information for
Credit Risk dataset sourced from Kaggle. The dataset’s features offer valuable information for
understanding loan applicants and their loan-related attributes.

RESULT AND OBSERVATION


LOAN STATUS BY OWNERSHIP

Analysis:
The graph visualizes the relationship between loan status and different home ownership
categories.

3
Key observations:
Home Ownership Categories:

Most loans are associated with individuals who either rent or have a mortgage. These two
categories have significantly higher counts than “OWN” or “OTHER.”

Almost no one in the “NONE” category has taken out a loan according to this dataset.

Loan Status:

For all types of home ownership except “NONE” (which has no data), most of the loans are
current.

Charged Off loans are relatively low in number across all ownership categories.

Fully Paid loans are more common than Charged Off loans.

Interpretation:
Borrowers who rent or have a mortgage are more likely to take out loans.

The “OWN” category has fewer total loans but follows a similar trend, with most loans being
current.

4
The lack of data for “NONE” indicates that very few individuals fall into this category.

Implications:

Risk Assessment: Understanding loan status by home ownership helps assess risk. For
example, if most “OWN” loans are current, it suggests that homeowners are generally reliable
borrowers.

Business Decisions: Lenders can tailor their strategies based on ownership types. For instance,
offering better terms to mortgage holders might attract more business

Market Insights: The data provides insights into housing trends and loan behavior across
different ownership categories.

LOAN AMOUNT DISTRIBUTION BY GRADE

Analysis:
The boxplot displays the following information for each loan grade:

Median (Q2): The horizontal line inside each box represents the median loan amount for that
grade.

5
Interquartile Range (IQR): The box represents the middle 50% of loan amounts (from Q1 to
Q3).

Whiskers: The vertical lines extending from the box indicate the range of data within 1.5 times
the IQR.

Outliers: Individual data points beyond the whiskers are considered outliers.

Key observations:

Grade G has a higher median loan amount compared to other grades.

Grades A and B have relatively lower median loan amounts.

There are many outliers in all grades, especially in grade D, indicating that there are several
loans that are significantly higher than the rest.

Interpretation:
The variation in median values across different grades suggests that individuals with different
grades receive loans of varying amounts.

The presence of many outliers could indicate inconsistencies or exceptions in how loans are
awarded. It might be worth investigating these outliers further.

6
The higher median loan amount in grade G could imply that borrowers with lower credit grades
seek larger loans.

Implications:
Lending Strategy: Lenders should investigate their grading system and its correlation with
loan amounts. Adjusting grading criteria might lead to more consistent loan amounts.

Risk Assessment: Understanding the distribution of loan amounts by grade helps assess risk.
Higher loan amounts in certain grades might indicate higher default risk.

Customer Segmentation: Consider segmenting borrowers based on their credit grades and
loan preferences to tailor services accordingly.

LOAN AMOUNT BY VERIFICATION STATUS

Analysis:
The bar plot displays the following information:

Each bar represents a verification status category.

The height of each bar corresponds to the average loan amount for that verification status.

Error bars are present on each bar, indicating variability in each category.

7
Interpretation:

Verification Status and Loan Amount:

Borrowers whose income or other information has been verified tend to receive higher loan
amounts (as indicated by the “Verified” category).

Loans with “Source Verified” status have intermediate loan amounts.

Loans with “Not Verified” status have the lowest average loan amounts.

Implications:

Risk Assessment: Lenders might consider the verification status as an indicator of borrower
reliability. Verified borrowers may be perceived as less risky.

Lending Decisions: Offering larger loans to verified borrowers could attract more business.

Fairness and Consistency: Lenders should ensure that verification processes are consistent and
fair across all applicants.

8
Analysis:

The histogram shows the following information:

The distribution of annual incomes is skewed to the right.

There’s a peak near the lower end of the income scale, indicating a concentration of low-income
earners.

As income increases, the frequency decreases rapidly.

The KDE line confirms the peak near zero and the rapid decline afterward.

Interpretation:

Income Disparities:

The high frequency of low-income earners suggests income disparities within this dataset.

9
Many individuals have annual incomes close to zero or very low.

Wealth Distribution:

The sharp decline in frequency as income increases indicates that fewer people earn higher
incomes.

The long tail on the right side represents a smaller number of high-income earners.

Implications:

Social Equity: Addressing income disparities is crucial for promoting social equity.

Targeted Services: Financial institutions can tailor services based on income levels.

Risk Assessment: Understanding income distribution helps assess risk associated with loans or
credit decisions.

LOAN STATUS BY GRADE

Analysis:

The bar plot displays the following information:

Each bar represents a verification status category.

The height of each bar corresponds to the average loan amount for that verification status.

Error bars are present on each bar, indicating variability in each category.

Interpretation:

Verification Status and Loan Amount:

10
Borrowers whose income or other information has been verified tend to receive higher loan
amounts (as indicated by the “Verified” category).

Loans with “Source Verified” status have intermediate loan amounts.

Loans with “Not Verified” status have the lowest average loan amounts.

Implications:

Risk Assessment: Lenders might consider the verification status as an indicator of borrower
reliability. Verified borrowers may be perceived as less risky.

Lending Decisions: Offering larger loans to verified borrowers could attract more business.

Fairness and Consistency: Lenders should ensure that verification processes are consistent and
fair across all applicants.

LOAN AMOUNT DISTRIBUTION BY TERM

11
Analysis:
The box plot displays the following information:

Two distinct boxes represent two different loan terms: “36 months” and “60 months.”

The orange box represents the “36 months” term, and the blue box represents the “60 months”
term.

Inside each box:

The horizontal line represents the median loan amount for that term.

The box represents the interquartile range (IQR), covering the middle 50% of loan amounts.

The whiskers extend from the box to the minimum and maximum values within 1.5 times the
IQR.

Outliers are plotted as individual points beyond the whiskers.

Interpretation:

Loan Amount Distribution:

For 36-month loans:

12
The median loan amount is lower.

The IQR is narrower, indicating less variability in loan amounts.

Fewer outliers are present.

For 60-month loans:

The median loan amount is higher.

The IQR is wider, suggesting greater variability in loan sizes.

More outliers are visible, indicating larger loans.

Implications:
Risk and Duration: Longer-term loans (60 months) tend to have larger loan amounts, but they
also come with higher risk due to the wider spread of data.

Borrower Preferences: Borrowers might choose longer terms for larger loans, but lenders
should carefully assess risk associated with extended repayment periods.

Lending Strategy: Lenders can tailor their offerings based on borrower preferences and risk
tolerance for different loan terms.

LOAN AMOUNT BY PURPOSE

Analysis:
The bar plot displays the following information:

Each bar represents a loan purpose category.

The height of each bar corresponds to the average loan amount for that purpose.

Error bars indicate variability in the data.

13
Interpretation:

Loan Amount by Purpose:

“Major Purchase,” “Small Business,” and “Home Improvement” have the highest average loan
amounts.

“Credit Card” and “Debt Consolidation” have similar average loan amounts.

“Wedding” loans have the lowest average amount.

Implications:
Lending Strategies: Lenders can tailor their loan products based on specific purposes. For
example:

Offering larger loans for home improvements or major purchases.

Providing smaller loans for weddings or credit card debt consolidation.

Risk Assessment: Different loan purposes may carry varying levels of risk. Lenders should
consider this when evaluating loan applications.

14
Customer Segmentation: Understanding loan purposes helps segment borrowers and design
targeted marketing campaigns.

DISTRIBUTION OF DEBT-TO-INCOME RATIO

Analysis:
The histogram displays the distribution of debt-to-income (DTI) ratios.

The x-axis represents different ranges of DTI ratios, while the y-axis represents the frequency
(number of occurrences) of each DTI ratio.

The KDE (Kernel Density Estimation) line provides a smoothed estimate of the underlying
probability density function.

15
Interpretation:

Most individuals in the dataset have DTI ratios clustered around 0.10 to 0.20.

There are very few individuals with extremely low or high DTI ratios.

The KDE line shows that the data is skewed toward lower DTI ratios.

Implications:
Risk Assessment:

A higher DTI ratio indicates that an individual has more debt relative to their income.

Lenders should be cautious when approving loans for individuals with high DTI ratios.

Financial Health:

Understanding the distribution of DTI ratios helps financial institutions assess the financial
health of their customers.

It can guide lending decisions and risk management strategies.

LOAN AMOUNT VS. ANNUAL INCOME

16
Analysis:
The scatter plot displays individual data points, each representing a borrower.

The x-axis represents “Annual Income,” while the y-axis represents “Loan Amount.”

Data points are color-coded based on their “loan_status”: Charged Off (green), Fully Paid
(orange), and Current (blue).

Most data points cluster toward the lower end of annual income, indicating that many loans are
taken by individuals with lower incomes.

There’s a concentration of fully paid loans (orange) at lower annual incomes and smaller loan
amounts.

Interpretation:

Income and Loan Amount Relationship:

17
Individuals with lower annual incomes tend to take smaller loan amounts.

As annual income increases, some borrowers take larger loans.

Risk Assessment:

The presence of charged-off loans (green) among those with low annual income suggests
higher risk for default.

Lenders should be cautious when approving loans for individuals with low income.

Financial Health Insights:

Understanding this distribution helps financial institutions assess the financial health of their
customers.

It can guide lending decisions and risk management strategies.

Implications:
Risk Mitigation:

Lenders might consider implementing more stringent lending criteria for applicants with lower
annual incomes.

Monitoring loan performance based on income levels is crucial.

Tailored Services:

Offering financial literacy programs or personalized advice to borrowers with low income can
improve financial well-being.

Targeted loan products for specific income segments can enhance customer satisfaction.

PAIR PLOT OF KEY VARIABLES

18
Analysis:
The pair plot displays scatter plots for different combinations of two variables
(‘annual_income’ and ‘loan_amount’) and histograms or kernel density estimates for individual
variables.

The hue parameter colors the data points based on the ‘grade’ column.

The diagonal shows the distribution of each variable.

Interpretation:

Annual Income vs. Loan Amount:

There’s no clear linear relationship between annual income and loan amount.

Most data points cluster toward the lower end of annual income and loan amount.

As annual income increases, there’s a spread in loan amounts but not as densely populated.

Grade (Risk) Impact:

Different grades (A-G) are represented with different colors.

Certain grades seem associated with specific ranges of annual incomes and loan amounts.
19
For example, Grade A loans tend to have higher annual incomes and smaller loan amounts.

Implication:
Financial analysts or credit officers could use such visualizations to identify patterns or trends
that might inform credit risk assessments or lending decisions.

Understanding the relationships between these variables can guide lending strategies and risk
management.

CORRELATION MATRIX

Analysis:
The heatmap displays the pairwise correlation coefficients between five financial variables:
‘annual_income’, ‘loan_amount’, ‘dti’ (debt-to-income ratio), ‘total_acc’ (total accounts), and
‘int_rate’ (interest rate).

The color scale ranges from cool (blue) to warm (red), representing negative to positive
correlations.

20
Interpretation:

Annual Income and Loan Amount (0.27):

There’s a positive correlation (0.27), indicating that as annual income increases, the loan
amount tends to increase as well.

Annual Income and DTI (-0.12):

A weak negative correlation (-0.12) suggests that higher income may be associated with a lower
debt-to-income ratio.

Loan Amount and Interest Rate (0.31):

Positive correlation (0.31) indicates that larger loans are associated with higher interest rates.

Total Accounts and DTI (0.23):

Indicates that individuals with more accounts tend to have higher debt-to-income ratios.

Interest Rate and Total Accounts (-0.042):

A very weak negative correlation (-0.042), almost negligible.

21
Implications:
Risk Assessment:

Lenders could consider these correlations when assessing loan applications.

For example, applicants with higher incomes might be eligible for larger loans.

Interest Rate Determination:

The positive correlation between loan amount and interest rate suggests that risk assessment
varies with loan size; larger loans might be considered riskier.

Portfolio Diversification:

Understanding these correlations can guide investment decisions, especially when considering
diversification across different loan types.

DISTRIBUTION OF INTEREST RATES

Analysis:
The histogram shows the frequency of different interest rate values in a dataset.

The KDE (Kernel Density Estimation) line provides a smoothed estimate of the underlying
probability density function.

Interest rates around 0.075 and 0.125 are most common, as indicated by their higher
frequencies.

22
Interpretation:

Interest Rate Distribution:

The distribution is right-skewed, meaning there are some loans with relatively high-interest
rates, but they are less frequent.

Most data points cluster toward the lower end of interest rates.

Interest Rate Clusters:

There are two prominent peaks in this distribution, indicating that there are two groups or
clusters within this data with distinct average interest rates.

These clusters might correspond to different loan products or risk categories.

Implications:
Risk and Return Profiles:

Lenders can use this information to assess risk and return profiles associated with different
interest rates.

Borrowers can gauge typical interest rates they might expect when seeking loans.

23
Product Offerings:

Financial institutions can design loan products tailored to specific interest rate ranges based on
customer preferences and risk tolerance.

IMPLEMENTATION

CONCLUSION

24

You might also like