100% found this document useful (1 vote)
66 views17 pages

Data Analysis Powerpoint

The document discusses various techniques for analyzing creditworthiness data, including credit scores, logistic regression, decision trees, and linear discriminant analysis. The key points are: - Credit scores like FICO range from 300-850 and are based on credit report data from the major bureaus. Age and gender affect creditworthiness. - A logistic regression model for creditability found all determinants to be significant. Important factors included relationship status and loan amounts. - In a decision tree for creditability, age was found to be an 80% predictor. - A linear discriminant analysis of loan status found those with balances below 200M had the highest creditworthiness indicators, while higher balances had lower
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
66 views17 pages

Data Analysis Powerpoint

The document discusses various techniques for analyzing creditworthiness data, including credit scores, logistic regression, decision trees, and linear discriminant analysis. The key points are: - Credit scores like FICO range from 300-850 and are based on credit report data from the major bureaus. Age and gender affect creditworthiness. - A logistic regression model for creditability found all determinants to be significant. Important factors included relationship status and loan amounts. - In a decision tree for creditability, age was found to be an 80% predictor. - A linear discriminant analysis of loan status found those with balances below 200M had the highest creditworthiness indicators, while higher balances had lower
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Credit 85

Machine Learning
Factors affecting customer Creditability
Credit Score
 The most widely used credit score is the FICO Score,
which was developed by the Fair Isaac Corporation.
 FICO Scores range from 300 to 850, with higher scores
indicating better creditworthiness.
 The FICO Score is calculated based on a combination of
credit report data from the three major credit bureaus:
Experian, Equifax, and TransUnion.
 In addition to the FICO Score, there are other credit
scores available, such as the VantageScore, which was
developed jointly by the three major credit bureaus.
 Like the FICO Score, the VantageScore ranges from 300
to 850, with higher scores indicating better
creditworthiness.
Data Used in the analysis

 The output gives the glimpse of the data used


in the analysis.
 The table below also gives the top 5 data used
in this analysis.
Factors affecting customer Creditability

Age/Gender
 The figure to the left gives the
creditability by Age, grouped by
gender.
 Younger adults are more likely to
be creditable, when compared
to older adults.
 Across all the ages, Male Single
are more likely to have bad
credit rating when compared to
other genders.
Factors affecting customer Creditability
Status
 The average number of Existing credits
score higher in the Creditability, when
compared to the lower number.
 Status 1: 0< 200 balance score high in
credit worthiness.
Logistic Regression
Logistic regression is a type of statistical model that is commonly The logistic regression model
used for classification and predictive analytics. It is a variation of estimates the probability of
the dependent variable being
linear regression, but instead of predicting a continuous
in a certain category based on
numerical value, it predicts the probability of an event occurring the values of the independent
based on a set of input variables. variables. The model does this
by applying a logistic function,
Logistic regression is used when the dependent variable, also
also known as a sigmoid
known as the outcome or response variable, is binary, meaning it
function, to a linear
can take one of two possible values, such as yes or no, or 0 or 1. combination of the input
The independent variables, also known as predictors or features, variables.
can be continuous, categorical, or binary.
Logistic Regression
• While running the logistic regression
for Creditability, the coefficient of all
determinants is significant at -3.38.
• Other important variables are Sex:
female-single and Male-Single.
• The amount, Guaranter and car loan
are the most significant factors from
the glm regression given.
• The AIC for the data is 3171, and this
means a poor model for logistic
regression.
Decision Trees
A decision tree is a type of supervised learning
algorithm used in machine learning that can be used
for both classification and regression tasks.

The algorithm builds a tree-like model of decisions


and their possible consequences, with each decision
leading to a new set of decisions or an outcome.

The decision tree begins with a single node, known


as the root node, which represents the entire
dataset.

The algorithm then splits the dataset into smaller


subsets based on the values of a chosen feature or
attribute.
Decision Trees
This process is repeated recursively for each subset
until the algorithm reaches a leaf node, which
represents the final decision or outcome.

The decision tree algorithm uses a set of rules to


determine the optimal split of each subset.

These rules are typically based on measures such as


information gain or Gini index, which evaluate the
purity of the subsets.

Decision trees have several advantages over other


machine learning algorithms.

They are easy to interpret and visualize, which makes


them useful for explaining the decision-making process
to non-technical stakeholders.

Additionally, decision trees are computationally


efficient, making them suitable for large datasets.
Decision Trees
However, decision trees can be prone to overfitting,
which occurs when the model is too complex and fits
the training data too closely, resulting in poor
generalization to new data.

Techniques such as pruning or ensemble methods


like random forests can be used to mitigate this
issue.

In summary, decision trees are a versatile and


powerful machine learning algorithm that can be
used for both classification and regression tasks.

They are easy to interpret and computationally


efficient, but can be prone to overfitting.
Decision Trees

From the results, the single most important


Factor in predicting predictability is AGE.
Age is an 80% predictor for Creditability.
Linear Discriminant Analysis

• The following shows the partial Result of the


Linear Discriminant Analysis.
Linear Discriminant Analysis
• The Linear Discriminant analysis was investigated on
loan Status.
• Individuals with loan status below 200M have the
highest indicators of creditability, at 47.04%, at the
same time, loans status between 200 to 400 million
is at 40.6%.
• Higher loan status has low percentages of indicators
at 12.3%.
Linear Discriminant Analysis
• The result is as shown to the right.
• The largest clusters is cluster 1, followed by 2 and 3. The
green part is payment history, that has the largest cluster,
this is followed by purpose of the loan.
• Credit report and credit score are important factors in
determining creditworthiness and can impact your ability
to secure loans, credit cards, and other financial products.
• Payment history: This refers to whether you have made
your credit card, loan, or other debt payments on time.
Late or missed payments can have a negative impact on
your credit score.
Linear Discriminant Analysis
• Length of credit history: This takes into account how long you've had credit accounts open. A
longer credit history can show lenders that you have a track record of managing credit responsibly.
• New credit: This looks at how many new credit accounts you've opened recently. Opening several
new accounts in a short period of time can suggest to lenders that you're taking on too much debt
at once.
• The result of the Linear Discriminant Analysis, therefore shows that Loan Status to predict the
credit score for an individual.
References

Celik, M. (2001). Overview of compaction data analysis techniques. Drug


development and industrial pharmacy, 18(6-7), 767-810.
Click icon to add picture
Westbrook, L. (2019). Qualitative research methods: A review of major stages, data
analysis techniques, and quality controls. Library & information science
research, 16(3), 241-254.
Proctor, A., & Sherwood, P. M. (1982). Data analysis techniques in x-ray
photoelectron spectroscopy. Analytical Chemistry, 54(1), 13-19.
R code

R code can be found on the notes Section of the slide

You might also like