100% found this document useful (2 votes)
469 views66 pages

Credit Risk Modeling in R

Here are the key steps for building a logistic regression model in R: 1. Split the data into training and test sets. This allows you to train the model on a portion of the data and validate it on held-out data. 2. Fit a null model with only the intercept term. This serves as a baseline to compare subsequent models against. 3. Fit models adding variables one by one and check their significance levels. Remove non-significant variables. 4. Compare the residual deviance and null deviance of each model. Lower residual deviance indicates better fit after accounting for variables. 5. Check the Akaike Information Criterion (AIC) values. Prefer models with lower

Uploaded by

Arjun Khosla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
469 views66 pages

Credit Risk Modeling in R

Here are the key steps for building a logistic regression model in R: 1. Split the data into training and test sets. This allows you to train the model on a portion of the data and validate it on held-out data. 2. Fit a null model with only the intercept term. This serves as a baseline to compare subsequent models against. 3. Fit models adding variables one by one and check their significance levels. Remove non-significant variables. 4. Compare the residual deviance and null deviance of each model. Lower residual deviance indicates better fit after accounting for variables. 5. Check the Akaike Information Criterion (AIC) values. Prefer models with lower

Uploaded by

Arjun Khosla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 66

Credit Risk Modeling in R

Credit risk modelling is the best way for lenders to understand how likely a particular loan is to get repaid. In other words, it’s a tool to
understand the credit risk of a borrower. This is especially important because this credit risk profile keeps changing with time and
circumstances.
What is Credit Risk?

Credit risk refers to the chance that a borrower will be unable to make their payments on time and default on their
debt. It refers to the risk that a lender may not receive their interest due or the principal lent on time.

This results in an interruption of cash flows for the lender and increases the cost of collection. In extreme cases, some
part of the loan or even the entire loan may have to be written off resulting in a loss for the lender.

It is extremely difficult and complex to pinpoint exactly how likely a person is to default on their loan. At the same
time, properly assessing credit risk can reduce the likelihood of losses from default and delayed repayment.
What is Credit Risk?

Interest payments from the borrower are the lender’s


reward for bearing credit risk. If the credit risk is
higher, the lender or investor will either charge a
higher interest or forego the lending opportunity
altogether.

For example, a loan applicant with a superior credit


history and steady income will be charged a lower
interest rate for the same loan than an applicant with a
poor credit history.
What is Credit Risk Modelling?

Credit risk modelling refers to the process of using data models to find out two important things.

1. The first is the probability of the borrower defaulting on the loan.

2. The second is the impact on the financials of the lender if this default occurs.

Financial institutions rely on credit risk models to determine the credit risk of potential borrowers. They make
decisions on whether or not to sanction a loan as well as on the interest rate of the loan based on the credit risk
model validation.
Which Factors Affect Credit Risk Modelling?

There are several major factors to consider while determining credit risk. From the financial health of the borrower and the
consequences of default for both the borrower and the creditor to a variety of macroeconomic considerations. Here are
three major factors affecting the credit risk of a borrower.

(i) The Probability of Default (PD)

This refers to the likelihood that a borrower will default on their loans and is obviously the most important part of a credit
risk model. For individuals, this score is based on their debt-income ratio and existing credit score.

The PD generally determines the interest rate and amount of down payment needed.
Which Factors Affect Credit Risk Modelling?

(ii) Loss Given Default

This refers to the total loss that the lender will suffer if the debt is not repaid. This is a critical component in credit risk modeling.
For instance, two borrowers with the same credit score and a similar debt-income ratio will present two very different credit risk
profiles if one is borrowing a much larger amount.

That’s because the loss to the lender in case of default is much higher when the amount is larger. This again plays a big role in
determining interest rates and down payments. If the borrower is willing to offer collateral then that has a big impact on the interest
rate offered.

(iii) Exposure at Default

This is a measure of the total exposure that a lender is exposed to at any given point of time. This also has an impact on the credit
risk because it is an indicator of the risk appetite of the lender. It is calculated by multiplying each loan by a certain percentage
depending on the particulars of the loan.
Types of Credit Risk Rating Models

(i) The Models Based on Financial Statement Analysis

Examples of these models include Altman Z score and Moody’s Risk Calc. These models are based on an analysis of financial statements
of borrowing institutions. They chiefly take into account well known financial ratios that can be useful in determining credit risk. For
instance, Altman Z score takes into account financial ratios like EBIDTA/total taxes and sales/total assets in different proportions to
determine the likelihood of a company going bankrupt.
Types of Credit Risk Rating Models

(ii) Machine Learning Models

The introduction of machine learning and big data to credit risk modeling has made it possible to create credit risk
models that are far more scientific and accurate.

Big data and analytics are enabling credit risk modelling to become more scientific as it is now based more on past data
than guesswork. In fact, credit risk modeling using R, Python, and other programming languages is becoming more
mainstream
Credit Risk
Modelling in R
Understand the dataset …… loaddata.rds
Contingency tables provide a way to display
the frequencies and relative frequencies of

Important Step observations, which are classified according to two


categorical variables. The elements of one category
Use contingency tables to understand dataset.
are displayed across the columns; the elements of the
other category are displayed over the rows. 
Prop.r - If TRUE, row proportions will be included
prop.c - If TRUE, column proportions will be included
prop.t - If TRUE, table proportions will be included
prop.chisq - If TRUE, chi-square contribution of each cell will be included
Chisq - If TRUE, the results of a chi-square test will be included
Removing outlier record

Gives the rowid of the outlier


EDA performed on int_rate and all NA
values are removed.

Note – Different dataframe used to save


loan2 (original dataset loan still has NA
values)
EDA Ends Decision Tree
Cp : complexity parameter.

Any split that does not decrease the overall lack of fit by a factor of cp is
not attempted.

The complexity parameter, is the threshold value for a decrease in


overall lack of fit for any split. If cp is not met, further splits will no
longer be pursued. cp's default value is 0.01, but for complex problems,
it is advised to relax cp.

Keep cp = -1 means tree to be fully grown


The default value for Cp value is 0.01.

When Cp = 0.00, the decision tree has no restrictions on what a split


must add, and it’ll produce the most complex tree possible.

Method :

one of "anova", "poisson", "class" or "exp".
If method is missing then the routine tries to make an
intelligent guess. If y is a survival object, then method =
"exp" is assumed, if y has 2 columns then method =
"poisson" is assumed, if y is a factor then method =
"class" is assumed, otherwise method = "anova" is
assumed. It is wisest to specify the method directly,
especially as more criteria may added to the function in
future.
Changing prior probabilities
Prior probability: the proportion of events and non-events in an imbalance data set.

Changing the prior probabilities in a data set indirectly adjusts the importance of incorrectly classifying the prediction for each class.

By making the prior probabilities for “Loan status – 0 ” bigger, we put more importance when the prediction is classified as “False Positive” or “False Negative”
(aka misclassification).

How to change prior probabilities:

Applying prior probabilities to RPART is very easy.

We apply parms(), an additional argument inside rpart(), and this argument specifically deals with unbalanced class sizes.

Inside the parms() argument, define the percentage proportion we want to apply on for our example below, we’ll start with a percentage proportion of (70% =loan
status - 0, 30% = loan status -1).

Note: parms argument should always sum up to 1.


Prior Probabilities ….
So, what is the best Cp value?
We want to choose the Cp value that produces the lowest amount of cross validation error.

Cross-Validation is a technique used in model selection to better estimate how our decision tree will perform.

The idea behind cross-validation is to create a number of partitions of sample observations, known as the validation sets, from the training data set, then
we measure the performance against each validation set, and then calculate the average error. It gives us a better assessment of how the model will perform
when asked to predict for new observations.

The functions printcp() and plotcp() will help us validate and identify the best Cp value for our model.

printcp() function will generate a list of Cp values, and we use this list to find the value that has the least amount of cross-validated error (xerror). The cp
value with the least amount of cross-validated error (xerror) will generate a decision tree with the most efficient amount of decision nodes for our data set.

For our example below, we’ll print the cp list of “tree_prior” (changing prior probability decision tree).
The cp value with the least
amount of cross-validated error
(xerror) will generate a decision
tree with the most
efficient amount of decision
nodes for our data set.
Long list ….. Check in R
Credit Modelling
Using Logistic Regression
in R
Null Deviation : Deviation that we get from actual
value of dataset ( only using intercept).
🡪 Lower the value better the model

Residual Deviance : include independent


variables (B,C ...)
🡪 Lower the value better the model

Check the significance levels .. Look for the


variable that can be removed

AIC ( Akaike Information Criteria)


Similar to adjusted R2 .
AIC is the measure of fit which penalizes model for
the number of model coefficients

🡪 Preference : Model with minimum AIC value


Optimize the Model
(Remove insignificant part)

AIC should decrease


whereas
Residual remain same
Taken threshold of 0.15 to convert into Binomial 0 or 1
Note – Confusion Matrix & Accuracy of Decision
Tree
Comparative
Analysis using Understanding AUC - ROC Curve

ROC
Understanding Confusion Matrix

It is a performance measurement for machine learning


classification problem where output can be two or more
classes. It is a table with 4 different combinations of
predicted and actual values.

True Positive: You predicted positive and it’s true.

True Negative: You predicted negative and it’s true.

False Positive (Type 1 Error) : You predicted positive and it’s false.
False Negative (Type 2 Error) : You predicted negative and it’s false.
What is AUC - ROC Curve?

• In Machine Learning, performance measurement is an essential task. So when it comes to a classification problem,
we can count on an AUC - ROC Curve. When we need to check or visualize the performance of the multi - class
classification problem, we use AUC (Area Under The Curve) ROC (Receiver Operating Characteristics) curve. It
is one of the most important evaluation metrics for checking any classification model’s performance. It is also written
as AUROC (Area Under the Receiver Operating Characteristics).

• AUC - ROC curve is a performance measurement for classification problem at various thresholds settings. ROC is a
probability curve and AUC represents degree or measure of separability. It tells how much model is capable of
distinguishing between classes. Higher the AUC, better the model is at predicting 0s as 0s and 1s as 1s.

TPR (True Positive Rate), FPR (False Positive Rate)


This threshold
can be adjusted
to tune the
behavior of the
model for a
specific
problem.
New Threshold … after
increasing the threshold
Continuously increasing the
threshold we can derive various
dots … and will reach to the
situation where all dots represent
obese
Ket Pointers:

The Receiver Operator Characteristic (ROC) curve is an evaluation metric for binary classification problems. It is a
probability curve that plots the TPR against FPR at various threshold values and essentially separates the ‘signal’ from the
‘noise’.

The Area Under the Curve (AUC) is the measure of the ability of a classifier to distinguish between classes and is used as a
summary of the ROC curve.

The higher the AUC, the better the performance of the model at distinguishing between the positive and negative classes.
Design

Right from the start, you’ll have access to beautiful,


widescreen themes that you can easily change to match
your style. Every theme comes with a variety of color
variants that you can mix and match.

New features like Merge Shapes and a color-matching


Eyedropper open up new possibilities for your designs.

Line up your layouts, photos, and diagrams perfectly in


seconds with alignment guides and smart guides.
Impress

The improved Presenter View has new tools to keep you in control. The
new Auto-Extend instantly applies the right settings for you, so you can
focus on speaking instead of your display.

• Slide zoom – Help focus your audience on your ideas.  Just click to
zoom in and out on a specific diagram, chart or graphic.

• Slide Navigator – A feature that enables the user to visually browse for
and navigate to other slides without leaving Slide Show view. Your
audience only sees the slide you’re presenting.
Work Together

Edit with others from different PCs at the same


time and have conversations with improved
commenting.

Sharing online is simple. Even if your audience


doesn’t have PowerPoint, simply project to their
browser with Present Online.

Work together with others at the same time from


different locations, whether you are using
PowerPoint on your desktop or PowerPoint
Online.
Intuitively design beautiful presentations,
easily share and work together with others
PowerPoint 2013 and give a professional performance with
advanced presenting tools.

Find out more at the PowerPoint Getting Started Center


(Click the arrow when in Slide Show mode)

You might also like