Model evaluation
and implementation
CREDIT RIS K MODELIN G IN P YTH ON
Michael Crabtree
Data Scientist, Ford Motor Company
Comparing classi cation reports
Create the reports with classification_report() and compare
CREDIT RISK MODELING IN PYTHON
ROC and AUC analysis
Models with better performance will have more lift
More lift means the AUC score is higher
CREDIT RISK MODELING IN PYTHON
Model calibration
We want our probabilities of default to accurately represent the model's con dence level
The probability of default has a degree of uncertainty in it's predictions
A sample of loans and their predicted probabilities of default should be close to the percentage of
defaults in that sample
Sample of loans Average predicted PD Sample percentage of actual defaults Calibrated?
10 0.12 0.12 Yes
10 0.25 0.65 No
1
http://datascienceassn.org/sites/default/ les/Predicting%20good%20probabilities%20with%20supervised%20learning.
CREDIT RISK MODELING IN PYTHON
Calculating calibration
Shows percentage of true defaults for each predicted probability
Essentially a line plot of the results of calibration_curve()
from sklearn.calibration import calibration_curve
calibration_curve(y_test, probabilities_of_default, n_bins = 5)
# Fraction of positives
(array([0.09602649, 0.19521012, 0.62035996, 0.67361111]),
# Average probability
array([0.09543535, 0.29196742, 0.46898465, 0.65512207]))
CREDIT RISK MODELING IN PYTHON
Plotting calibration curves
plt.plot(mean_predicted_value, fraction_of_positives, label="%s" % "Example Model")
CREDIT RISK MODELING IN PYTHON
Checking calibration curves
As an example, two events selected (above and below perfect line)
CREDIT RISK MODELING IN PYTHON
Calibration curve interpretation
CREDIT RISK MODELING IN PYTHON
Calibration curve interpretation
CREDIT RISK MODELING IN PYTHON
Let's practice!
CREDIT RIS K MODELIN G IN P YTH ON
Credit acceptance
rates
CREDIT RIS K MODELIN G IN P YTH ON
Michael Crabtree
Data Scientist, Ford Motor Company
Thresholds and loan status
Previously we set a threshold for a range of prob_default values
This was used to change the predicted loan_status of the loan
preds_df['loan_status'] = preds_df['prob_default'].apply(lambda x: 1 if x > 0.4 else 0)
Loan prob_default threshold loan_status
1 0.25 0.4 0
2 0.42 0.4 1
3 0.75 0.4 1
CREDIT RISK MODELING IN PYTHON
Thresholds and acceptance rate
Use model predictions to set better thresholds
Can also be used to approve or deny new loans
For all new loans, we want to deny probable defaults
Use the test data as an example of new loans
Acceptance rate: what percentage of new loans are accepted to keep the number of defaults in a
portfolio low
Accepted loans which are defaults have an impact similar to false negatives
CREDIT RISK MODELING IN PYTHON
Understanding acceptance rate
Example: Accept 85% of loans with the lowest prob_default
CREDIT RISK MODELING IN PYTHON
Calculating the threshold
Calculate the threshold value for an 85% acceptance rate
import numpy as np
# Compute the threshold for 85% acceptance rate
threshold = np.quantile(prob_default, 0.85)
0.804
Loan prob_default Threshold Predicted loan_status Accept or Reject
1 0.65 0.804 0 Accept
2 0.85 0.804 1 Reject
CREDIT RISK MODELING IN PYTHON
Implementing the calculated threshold
Reassign loan_status values using the new threshold
# Compute the quantile on the probabilities of default
preds_df['loan_status'] = preds_df['prob_default'].apply(lambda x: 1 if x > 0.804 else 0)
CREDIT RISK MODELING IN PYTHON
Bad Rate
Even with a calculated threshold, some of the accepted loans will be defaults
These are loans with prob_default values around where our model is not well calibrated
CREDIT RISK MODELING IN PYTHON
Bad rate calculation
#Calculate the bad rate
np.sum(accepted_loans['true_loan_status']) / accepted_loans['true_loan_status'].count()
If non-default is 0 , and default is 1 then the sum() is the count of defaults
The .count() of a single column is the same as the row count for the data frame
CREDIT RISK MODELING IN PYTHON
Let's practice!
CREDIT RIS K MODELIN G IN P YTH ON
Credit strategy and
minimum expected
loss
CREDIT RIS K MODELIN G IN P YTH ON
Michael Crabtree
Data Scientist, Ford Motor Company
Selecting acceptance rates
First acceptance rate was set to 85%, but other rates might be selected as well
Two options to test different rates:
Calculate the threshold, bad rate, and losses manually
Automatically create a table of these values and select an acceptance rate
The table of all the possible values is called a strategy table
CREDIT RISK MODELING IN PYTHON
Setting up the strategy table
Set up arrays or lists to store each value
# Set all the acceptance rates to test
accept_rates = [1.0, 0.95, 0.9, 0.85, 0.8, 0.75, 0.7, 0.65, 0.6, 0.55,
0.5, 0.45, 0.4, 0.35, 0.3, 0.25, 0.2, 0.15, 0.1, 0.05]
# Create lists to store thresholds and bad rates
thresholds = []
bad_rates = []
CREDIT RISK MODELING IN PYTHON
Calculating the table values
Calculate the threshold and bad rate for all acceptance rates
for rate in accept_rates:
# Calculate threshold
threshold = np.quantile(preds_df['prob_default'], rate).round(3)
# Store threshold value in a list
thresholds.append(np.quantile(preds_gbt['prob_default'], rate).round(3))
# Apply the threshold to reassign loan_status
test_pred_df['pred_loan_status'] = \
test_pred_df['prob_default'].apply(lambda x: 1 if x > thresh else 0)
# Create accepted loans set of predicted non-defaults
accepted_loans = test_pred_df[test_pred_df['pred_loan_status'] == 0]
# Calculate and store bad rate
bad_rates.append(np.sum((accepted_loans['true_loan_status'])
/ accepted_loans['true_loan_status'].count()).round(3))
CREDIT RISK MODELING IN PYTHON
Strategy table interpretation
strat_df = pd.DataFrame(zip(accept_rates, thresholds, bad_rates),
columns = ['Acceptance Rate','Threshold','Bad Rate'])
CREDIT RISK MODELING IN PYTHON
Adding accepted loans
The number of loans accepted for each acceptance rate
Can use len() or .count()
CREDIT RISK MODELING IN PYTHON
Adding average loan amount
Average loan_amnt from the test set data
CREDIT RISK MODELING IN PYTHON
Estimating portfolio value
Average value of accepted loan non-defaults minus average value of accepted defaults
Assumes each default is a loss of the loan_amnt
CREDIT RISK MODELING IN PYTHON
Total expected loss
How much we expect to lose on the defaults in our portfolio
# Probability of default (PD)
test_pred_df['prob_default']
# Exposure at default = loan amount (EAD)
test_pred_df['loan_amnt']
# Loss given default = 1.0 for total loss (LGD)
test_pred_df['loss_given_default']
CREDIT RISK MODELING IN PYTHON
Let's practice!
CREDIT RIS K MODELIN G IN P YTH ON
Course wrap up
CREDIT RIS K MODELIN G IN P YTH ON
Michael Crabtree
Data Scientist, Ford Motor Company
Your journey...so far
Prepare credit data for machine learning models
Important to understand the data
Improving the data allows for high performing simple models
Develop, score, and understand logistic regressions and gradient boosted trees
Analyze the performance of models by changing the data
Understand the nancial impact of results
Implement the model with an understanding of strategy
CREDIT RISK MODELING IN PYTHON
Risk modeling techniques
The models and framework in this course:
Discrete-time hazard model (point in time): the probability of default is a point-in-time event
Stuctural model framework: the model explains the default even based on other factors
Other techniques
Through-the-cycle model (continuous time): macro-economic conditions and other effects are
used, but the risk is seen as an independent event
Reduced-form model framework: a statistical approach estimating probability of default as an
independent Poisson-based event
CREDIT RISK MODELING IN PYTHON
Choosing models
Many machine learning models available, but logistic regression and tree models were used
These models are simple and explainable
Their performance on probabilities is acceptable
Many nancial sectors prefer model interpretability
Complex or "black-box" models are a risk because the business cannot explain their decisions
fully
Deep neural networks are often too complex
CREDIT RISK MODELING IN PYTHON
Tips from me to you
Focus on the data
Gather as much data as possible
Use many different techniques to prepare and enhance the data
Learn about the business
Increase value through data
Model complexity can be a two-edged sword
Really complex models may perform well, but are seen as a "black-box"
In many cases, business users will not accept a model they cannot understand
Complex models can be very large and dif cult to put into production
CREDIT RISK MODELING IN PYTHON
Thank you!
CREDIT RIS K MODELIN G IN P YTH ON