Credit Risk Modeling in R
Credit Risk Modeling in R
Credit risk modelling is the best way for lenders to understand how likely a particular loan is to get repaid. In other words, it’s a tool to
understand the credit risk of a borrower. This is especially important because this credit risk profile keeps changing with time and
circumstances.
What is Credit Risk?
Credit risk refers to the chance that a borrower will be unable to make their payments on time and default on their
debt. It refers to the risk that a lender may not receive their interest due or the principal lent on time.
This results in an interruption of cash flows for the lender and increases the cost of collection. In extreme cases, some
part of the loan or even the entire loan may have to be written off resulting in a loss for the lender.
It is extremely difficult and complex to pinpoint exactly how likely a person is to default on their loan. At the same
time, properly assessing credit risk can reduce the likelihood of losses from default and delayed repayment.
What is Credit Risk?
Credit risk modelling refers to the process of using data models to find out two important things.
2. The second is the impact on the financials of the lender if this default occurs.
Financial institutions rely on credit risk models to determine the credit risk of potential borrowers. They make
decisions on whether or not to sanction a loan as well as on the interest rate of the loan based on the credit risk
model validation.
Which Factors Affect Credit Risk Modelling?
There are several major factors to consider while determining credit risk. From the financial health of the borrower and the
consequences of default for both the borrower and the creditor to a variety of macroeconomic considerations. Here are
three major factors affecting the credit risk of a borrower.
This refers to the likelihood that a borrower will default on their loans and is obviously the most important part of a credit
risk model. For individuals, this score is based on their debt-income ratio and existing credit score.
The PD generally determines the interest rate and amount of down payment needed.
Which Factors Affect Credit Risk Modelling?
This refers to the total loss that the lender will suffer if the debt is not repaid. This is a critical component in credit risk modeling.
For instance, two borrowers with the same credit score and a similar debt-income ratio will present two very different credit risk
profiles if one is borrowing a much larger amount.
That’s because the loss to the lender in case of default is much higher when the amount is larger. This again plays a big role in
determining interest rates and down payments. If the borrower is willing to offer collateral then that has a big impact on the interest
rate offered.
This is a measure of the total exposure that a lender is exposed to at any given point of time. This also has an impact on the credit
risk because it is an indicator of the risk appetite of the lender. It is calculated by multiplying each loan by a certain percentage
depending on the particulars of the loan.
Types of Credit Risk Rating Models
Examples of these models include Altman Z score and Moody’s Risk Calc. These models are based on an analysis of financial statements
of borrowing institutions. They chiefly take into account well known financial ratios that can be useful in determining credit risk. For
instance, Altman Z score takes into account financial ratios like EBIDTA/total taxes and sales/total assets in different proportions to
determine the likelihood of a company going bankrupt.
Types of Credit Risk Rating Models
The introduction of machine learning and big data to credit risk modeling has made it possible to create credit risk
models that are far more scientific and accurate.
Big data and analytics are enabling credit risk modelling to become more scientific as it is now based more on past data
than guesswork. In fact, credit risk modeling using R, Python, and other programming languages is becoming more
mainstream
Credit Risk
Modelling in R
Understand the dataset …… loaddata.rds
Contingency tables provide a way to display
the frequencies and relative frequencies of
Any split that does not decrease the overall lack of fit by a factor of cp is
not attempted.
Method :
one of "anova", "poisson", "class" or "exp".
If method is missing then the routine tries to make an
intelligent guess. If y is a survival object, then method =
"exp" is assumed, if y has 2 columns then method =
"poisson" is assumed, if y is a factor then method =
"class" is assumed, otherwise method = "anova" is
assumed. It is wisest to specify the method directly,
especially as more criteria may added to the function in
future.
Changing prior probabilities
Prior probability: the proportion of events and non-events in an imbalance data set.
Changing the prior probabilities in a data set indirectly adjusts the importance of incorrectly classifying the prediction for each class.
By making the prior probabilities for “Loan status – 0 ” bigger, we put more importance when the prediction is classified as “False Positive” or “False Negative”
(aka misclassification).
We apply parms(), an additional argument inside rpart(), and this argument specifically deals with unbalanced class sizes.
Inside the parms() argument, define the percentage proportion we want to apply on for our example below, we’ll start with a percentage proportion of (70% =loan
status - 0, 30% = loan status -1).
Cross-Validation is a technique used in model selection to better estimate how our decision tree will perform.
The idea behind cross-validation is to create a number of partitions of sample observations, known as the validation sets, from the training data set, then
we measure the performance against each validation set, and then calculate the average error. It gives us a better assessment of how the model will perform
when asked to predict for new observations.
The functions printcp() and plotcp() will help us validate and identify the best Cp value for our model.
printcp() function will generate a list of Cp values, and we use this list to find the value that has the least amount of cross-validated error (xerror). The cp
value with the least amount of cross-validated error (xerror) will generate a decision tree with the most efficient amount of decision nodes for our data set.
For our example below, we’ll print the cp list of “tree_prior” (changing prior probability decision tree).
The cp value with the least
amount of cross-validated error
(xerror) will generate a decision
tree with the most
efficient amount of decision
nodes for our data set.
Long list ….. Check in R
Credit Modelling
Using Logistic Regression
in R
Null Deviation : Deviation that we get from actual
value of dataset ( only using intercept).
🡪 Lower the value better the model
ROC
Understanding Confusion Matrix
False Positive (Type 1 Error) : You predicted positive and it’s false.
False Negative (Type 2 Error) : You predicted negative and it’s false.
What is AUC - ROC Curve?
• In Machine Learning, performance measurement is an essential task. So when it comes to a classification problem,
we can count on an AUC - ROC Curve. When we need to check or visualize the performance of the multi - class
classification problem, we use AUC (Area Under The Curve) ROC (Receiver Operating Characteristics) curve. It
is one of the most important evaluation metrics for checking any classification model’s performance. It is also written
as AUROC (Area Under the Receiver Operating Characteristics).
• AUC - ROC curve is a performance measurement for classification problem at various thresholds settings. ROC is a
probability curve and AUC represents degree or measure of separability. It tells how much model is capable of
distinguishing between classes. Higher the AUC, better the model is at predicting 0s as 0s and 1s as 1s.
The Receiver Operator Characteristic (ROC) curve is an evaluation metric for binary classification problems. It is a
probability curve that plots the TPR against FPR at various threshold values and essentially separates the ‘signal’ from the
‘noise’.
The Area Under the Curve (AUC) is the measure of the ability of a classifier to distinguish between classes and is used as a
summary of the ROC curve.
The higher the AUC, the better the performance of the model at distinguishing between the positive and negative classes.
Design
The improved Presenter View has new tools to keep you in control. The
new Auto-Extend instantly applies the right settings for you, so you can
focus on speaking instead of your display.
• Slide zoom – Help focus your audience on your ideas. Just click to
zoom in and out on a specific diagram, chart or graphic.
• Slide Navigator – A feature that enables the user to visually browse for
and navigate to other slides without leaving Slide Show view. Your
audience only sees the slide you’re presenting.
Work Together