100% found this document useful (1 vote)

1K views36 pages

Credit Risk Modeling in Python Chapter2

The document discusses using logistic regression for predicting the probability of default on loans. It covers training a logistic regression model, interpreting the model coefficients, preprocessing categorical variables, evaluating model performance using metrics like accuracy and the ROC curve, and selecting classification thresholds.

Uploaded by

Fgpeqw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

1K views36 pages

Credit Risk Modeling in Python Chapter2

Uploaded by

Fgpeqw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Logistic regression

for probability of
default
CREDIT RIS K MODELIN G IN P YTH ON

Michael Crabtree
Data Scientist, Ford Motor Company
Probability of default
The likelihood that someone will default on a loan is the probability of default

A probability value between 0 and 1 like 0.86

loan_status of 1 is a default or 0 for non-default

CREDIT RISK MODELING IN PYTHON

Probability of default
The likelihood that someone will default on a loan is the probability of default

A probability value between 0 and 1 like 0.86

loan_status of 1 is a default or 0 for non-default

Probability of Default Interpretation Predicted loan status

0.4 Unlikely to default 0

0.90 Very likely to default 1

0.1 Very unlikely to default 0

CREDIT RISK MODELING IN PYTHON

Predicting probabilities
Probabilities of default as an outcome from machine learning
Learn from data in columns (features)

Classi cation models (default, non-default)

Two most common models:

Logistic regression

Decision tree

CREDIT RISK MODELING IN PYTHON

Logistic regression
Similar to the linear regression, but only produces values between 0 and 1

CREDIT RISK MODELING IN PYTHON

Training a logistic regression
Logistic regression available within the scikit-learn package

from sklearn.linear_model import LogisticRegression

Called as a function with or without parameters

clf_logistic = LogisticRegression(solver='lbfgs')

Uses the method .fit() to train

clf_logistic.fit(training_columns, np.ravel(training_labels))

Training Columns: all of the columns in our data except loan_status

Labels: loan_status (0,1)

CREDIT RISK MODELING IN PYTHON

Training and testing
Entire data set is usually split into two parts

CREDIT RISK MODELING IN PYTHON

Training and testing
Entire data set is usually split into two parts

Data Subset Usage Portion

Train Learn from the data to generate predictions 60%

Test Test learning on new unseen data 40%

CREDIT RISK MODELING IN PYTHON

Creating the training and test sets
Separate the data into training columns and labels

X = cr_loan.drop('loan_status', axis = 1)
y = cr_loan[['loan_status']]

Use train_test_split() function already within sci-kit learn

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4, random_state=123)

test_size : percentage of data for test set

random_state : a random seed value for reproducibility

CREDIT RISK MODELING IN PYTHON

Let's practice!
CREDIT RIS K MODELIN G IN P YTH ON
Predicting the
probability of default
CREDIT RIS K MODELIN G IN P YTH ON

Michael Crabtree
Data Scientist, Ford Motor Company
Logistic regression coef cients
# Model Intercept
array([-3.30582292e-10])
# Coefficients for ['loan_int_rate','person_emp_length','person_income']
array([[ 1.28517496e-09, -2.27622202e-09, -2.17211991e-05]])

# Calculating probability of default

int_coef_sum = -3.3e-10 +
(1.29e-09 * loan_int_rate) + (-2.28e-09 * person_emp_length) + (-2.17e-05 * person_income)
prob_default = 1 / (1 + np.exp(-int_coef_sum))
prob_nondefault = 1 - (1 / (1 + np.exp(-int_coef_sum)))

CREDIT RISK MODELING IN PYTHON

Interpreting coef cients
# Intercept
intercept = -1.02
# Coefficient for employment length
person_emp_length_coef = -0.056

For every 1 year increase in person_emp_length , the person is less likely to default

CREDIT RISK MODELING IN PYTHON

Interpreting coef cients
# Intercept
intercept = -1.02
# Coefficient for employment length
person_emp_length_coef = -0.056

For every 1 year increase in person_emp_length , the person is less likely to default

intercept person_emp_length value * coef probability of default

-1.02 10 (10 * -0.06 ) .17

-1.02 11 (11 * -0.06 ) .16

-1.02 12 (12 * -0.06 ) .15

CREDIT RISK MODELING IN PYTHON

Using non-numeric columns
Numeric: loan_int_rate , person_emp_length , person_income

Non-numeric:

cr_loan_clean['loan_intent']

EDUCATION
MEDICAL
VENTURE
PERSONAL
DEBTCONSOLIDATION
HOMEIMPROVEMENT

Will cause errors with machine learning models in Python unless processed

CREDIT RISK MODELING IN PYTHON

One-hot encoding
Represent a string with a number

CREDIT RISK MODELING IN PYTHON

One-hot encoding
Represent a string with a number

0 or 1 in a new column column_VALUE

CREDIT RISK MODELING IN PYTHON

Get dummies
Utilize the get_dummies() within pandas

# Separate the numeric columns

cred_num = cr_loan.select_dtypes(exclude=['object'])
# Separate non-numeric columns
cred_cat = cr_loan.select_dtypes(include=['object'])
# One-hot encode the non-numeric columns only
cred_cat_onehot = pd.get_dummies(cred_cat)
# Union the numeric columns with the one-hot encoded columns
cr_loan = pd.concat([cred_num, cred_cat_onehot], axis=1)

CREDIT RISK MODELING IN PYTHON

Predicting the future, probably
Use the .predict_proba() method within scikit-learn

# Train the model

clf_logistic.fit(X_train, np.ravel(y_train))
# Predict using the model
clf_logistic.predict_proba(X_test)

Creates array of probabilities of default

# Probabilities: [[non-default, default]]

array([[0.55, 0.45]])

CREDIT RISK MODELING IN PYTHON

Let's practice!
CREDIT RIS K MODELIN G IN P YTH ON
Credit model
performance
CREDIT RIS K MODELIN G IN P YTH ON

Michael Crabtree
Data Scientist, Ford Motor Company
Model accuracy scoring
Calculate accuracy

Use the .score() method from scikit-learn

# Check the accuracy against the test data

clf_logistic1.score(X_test,y_test)

0.81

81% of values for loan_status predicted correctly

CREDIT RISK MODELING IN PYTHON

ROC curve charts
Receiver Operating Characteristic curve
Plots true positive rate (sensitivity) against false positive rate (fall-out)

fallout, sensitivity, thresholds = roc_curve(y_test, prob_default)

plt.plot(fallout, sensitivity, color = 'darkorange')

CREDIT RISK MODELING IN PYTHON

Analyzing ROC charts
Area Under Curve (AUC): area between curve and random prediction

CREDIT RISK MODELING IN PYTHON

Default thresholds
Threshold: at what point a probability is a default

CREDIT RISK MODELING IN PYTHON

Setting the threshold
Relabel loans based on our threshold of 0.5

preds = clf_logistic.predict_proba(X_test)
preds_df = pd.DataFrame(preds[:,1], columns = ['prob_default'])
preds_df['loan_status'] = preds_df['prob_default'].apply(lambda x: 1 if x > 0.5 else 0)

CREDIT RISK MODELING IN PYTHON

Credit classi cation reports
classification_report() within scikit-learn

from sklearn.metrics import classification_report

classification_report(y_test, preds_df['loan_status'], target_names=target_names)

CREDIT RISK MODELING IN PYTHON

Selecting classi cation metrics
Select and store speci c components from the classification_report()

Use the precision_recall_fscore_support() function from scikit-learn

from sklearn.metrics import precision_recall_fscore_support

precision_recall_fscore_support(y_test,preds_df['loan_status'])[1][1]

CREDIT RISK MODELING IN PYTHON

Let's practice!
CREDIT RIS K MODELIN G IN P YTH ON
Model
discrimination and
impact
CREDIT RIS K MODELIN G IN P YTH ON

Michael Crabtree
Data Scientist, Ford Motor Company
Confusion matrices
Shows the number of correct and incorrect predictions for each loan_status

CREDIT RISK MODELING IN PYTHON

Default recall for loan status
Default recall (or sensitivity) is the proportion of true defaults predicted

CREDIT RISK MODELING IN PYTHON

Recall portfolio impact
Classi cation report - Underperforming Logistic Regression model

CREDIT RISK MODELING IN PYTHON

Recall portfolio impact
Classi cation report - Underperforming Logistic Regression model

Number of true defaults: 50,000

Loan Amount Defaults Predicted / Not Predicted Estimated Loss on Defaults

$50 .04 / .96 (50000 x .96) x 50 = $2,400,000

CREDIT RISK MODELING IN PYTHON

Recall, precision, and accuracy
Dif cult to maximize all of them because there is a trade-off

CREDIT RISK MODELING IN PYTHON

Let's practice!
CREDIT RIS K MODELIN G IN P YTH ON

Credit Risk Modeling in Python Chapter3
No ratings yet
Credit Risk Modeling in Python Chapter3
35 pages
Logit Model For PD
No ratings yet
Logit Model For PD
9 pages
Statistics - II Regression - For - Predictive - Modeling - CourseNotes PDF
No ratings yet
Statistics - II Regression - For - Predictive - Modeling - CourseNotes PDF
266 pages
Basics of Credit Risk Modelling
100% (1)
Basics of Credit Risk Modelling
13 pages
Market Risk Questions PDF
No ratings yet
Market Risk Questions PDF
16 pages
Credit Risk Modelling - A Primer
No ratings yet
Credit Risk Modelling - A Primer
42 pages
Credit Risk - Standardised Approach
100% (3)
Credit Risk - Standardised Approach
78 pages
11 17 06 Modeling Sovereign Correlations
No ratings yet
11 17 06 Modeling Sovereign Correlations
21 pages
Introduction To Data Visualization With Seaborn Chapter3
100% (1)
Introduction To Data Visualization With Seaborn Chapter3
32 pages
Designing Machine Learning Workflows in Python Chapter2
No ratings yet
Designing Machine Learning Workflows in Python Chapter2
39 pages
Credit Risk Modeling in Python Chapter1
100% (1)
Credit Risk Modeling in Python Chapter1
27 pages
Credit Risk Modeling in R
100% (2)
Credit Risk Modeling in R
66 pages
Credit Risk Modeling Using Python
No ratings yet
Credit Risk Modeling Using Python
133 pages
Credit Risk Modeling in Python Chapter4
100% (1)
Credit Risk Modeling in Python Chapter4
35 pages
Credit Risk - Predictive Modelling
No ratings yet
Credit Risk - Predictive Modelling
47 pages
Credit Risk Predictive Modelling - by EY
0% (1)
Credit Risk Predictive Modelling - by EY
37 pages
Credit Behavioral Model
No ratings yet
Credit Behavioral Model
54 pages
106 - Machine Learning and Credit Risk Modelling
100% (1)
106 - Machine Learning and Credit Risk Modelling
8 pages
Banking Credit Risk Analysis With Naive Bayes Approach and Cox Proportional Hazard
No ratings yet
Banking Credit Risk Analysis With Naive Bayes Approach and Cox Proportional Hazard
6 pages
Merton PD Model
No ratings yet
Merton PD Model
6 pages
Credit Risk Modeling
No ratings yet
Credit Risk Modeling
213 pages
FASB's Current Expected Credit Loss Model For Credit Loss Accounting (CECL) : Background and FAQ 'S For Bankers June 2016
No ratings yet
FASB's Current Expected Credit Loss Model For Credit Loss Accounting (CECL) : Background and FAQ 'S For Bankers June 2016
23 pages
Modeling of EAD and LGD: Empirical Approaches and Technical Implementation
100% (1)
Modeling of EAD and LGD: Empirical Approaches and Technical Implementation
21 pages
Credit Risk Modelling and Quantification
No ratings yet
Credit Risk Modelling and Quantification
144 pages
Credit Risk Modelling
No ratings yet
Credit Risk Modelling
28 pages
Modelling Credit Risk
No ratings yet
Modelling Credit Risk
27 pages
An Actuarialmodel For Credit Risk
No ratings yet
An Actuarialmodel For Credit Risk
17 pages
Credit Score Validation
No ratings yet
Credit Score Validation
5 pages
Credit Risk Project PDF
No ratings yet
Credit Risk Project PDF
104 pages
Credit Risk Sas
No ratings yet
Credit Risk Sas
152 pages
Forecasting Default With The KMV-Merton Model
No ratings yet
Forecasting Default With The KMV-Merton Model
35 pages
An Introduction To Credit Risk in Banking - BASEL, IFRS9, Pricing, Statistics, Machine Learning - PART 1 - by Willem Pretorius - Mar, 2023 - Medium
No ratings yet
An Introduction To Credit Risk in Banking - BASEL, IFRS9, Pricing, Statistics, Machine Learning - PART 1 - by Willem Pretorius - Mar, 2023 - Medium
37 pages
Lecture 1.1 CQF 2010 - B
No ratings yet
Lecture 1.1 CQF 2010 - B
52 pages
How To Credit Score With Predictive Analytics: Whitepaper
No ratings yet
How To Credit Score With Predictive Analytics: Whitepaper
7 pages
Stress Testing and Risk Integration in Banks: University of Passau
No ratings yet
Stress Testing and Risk Integration in Banks: University of Passau
53 pages
Sukanya Linear LogisticRegression Report
100% (1)
Sukanya Linear LogisticRegression Report
23 pages
Credit Risk Modeling Steps
No ratings yet
Credit Risk Modeling Steps
81 pages
Credit Risk Estimation Techniques
0% (1)
Credit Risk Estimation Techniques
31 pages
Credit Risk Modeling
No ratings yet
Credit Risk Modeling
4 pages
Credit Risk Irb Approach2
No ratings yet
Credit Risk Irb Approach2
232 pages
Credit Risk Models
No ratings yet
Credit Risk Models
32 pages
Online Credit Risk Analytics and Modeling
0% (2)
Online Credit Risk Analytics and Modeling
7 pages
Validators Guide To Model Risk Management by RiskSpan
100% (5)
Validators Guide To Model Risk Management by RiskSpan
29 pages
Model Risk On Credit Risk
No ratings yet
Model Risk On Credit Risk
41 pages
Credit Card Score Prediction Using Machine Learning
No ratings yet
Credit Card Score Prediction Using Machine Learning
8 pages
(Morton Lane) Alternative Risk Strategies
No ratings yet
(Morton Lane) Alternative Risk Strategies
725 pages
SAS Modeling Tool To Access Credit Risk
No ratings yet
SAS Modeling Tool To Access Credit Risk
19 pages
A Review On Credit Card Default Modelling Using Data Science
No ratings yet
A Review On Credit Card Default Modelling Using Data Science
7 pages
Step by Step Data Processing For ML Project
No ratings yet
Step by Step Data Processing For ML Project
16 pages
Kritika Sejwal 24MCI10023 ML Lab Project Report
No ratings yet
Kritika Sejwal 24MCI10023 ML Lab Project Report
10 pages
ch4 PDF
No ratings yet
ch4 PDF
32 pages
Ai It HW MST Prac
No ratings yet
Ai It HW MST Prac
14 pages
Major_project
No ratings yet
Major_project
9 pages
FRA Cheat Sheet Week1
No ratings yet
FRA Cheat Sheet Week1
2 pages
Loan Approval
No ratings yet
Loan Approval
12 pages
Python Code For Loan Default Prediction
No ratings yet
Python Code For Loan Default Prediction
4 pages
Logistic Regression
100% (1)
Logistic Regression
10 pages
Reading Material - Module-5 - Introduction To Special Topics
No ratings yet
Reading Material - Module-5 - Introduction To Special Topics
27 pages
Loan Default Logistics Regression
No ratings yet
Loan Default Logistics Regression
6 pages
Credit Loan Default Prediction
No ratings yet
Credit Loan Default Prediction
22 pages
Spoken Language Processing in Python Chapter4
No ratings yet
Spoken Language Processing in Python Chapter4
46 pages
Spoken Language Processing in Python Chapter2
No ratings yet
Spoken Language Processing in Python Chapter2
23 pages
Spoken Language Processing in Python Chapter3
No ratings yet
Spoken Language Processing in Python Chapter3
26 pages
Preparing Your Gures To Share With Others: Ariel Rokem
No ratings yet
Preparing Your Gures To Share With Others: Ariel Rokem
35 pages
Spoken Language Processing in Python Chapter1
No ratings yet
Spoken Language Processing in Python Chapter1
17 pages
Designing Machine Learning Workflows in Python Chapter4
No ratings yet
Designing Machine Learning Workflows in Python Chapter4
38 pages
Introduction To Data Visualization With Matplotlib Chapter2
No ratings yet
Introduction To Data Visualization With Matplotlib Chapter2
27 pages
Chapter3 PDF
No ratings yet
Chapter3 PDF
36 pages
Introduction To Data Visualization With Matplotlib: Ariel Rokem
No ratings yet
Introduction To Data Visualization With Matplotlib: Ariel Rokem
30 pages
Introduction To Data Visualization With Seaborn Chapter2
No ratings yet
Introduction To Data Visualization With Seaborn Chapter2
38 pages
Customer Segmentation in Python Chapter4
No ratings yet
Customer Segmentation in Python Chapter4
37 pages
Changing Plot Style and Color: Erin Case
No ratings yet
Changing Plot Style and Color: Erin Case
54 pages
Introduction To Data Visualization With Seaborn Chapter1
No ratings yet
Introduction To Data Visualization With Seaborn Chapter1
26 pages
Designing Machine Learning Workflows in Python Chapter1
No ratings yet
Designing Machine Learning Workflows in Python Chapter1
32 pages
Customer Segmentation in Python Chapter3
No ratings yet
Customer Segmentation in Python Chapter3
25 pages
Designing Machine Learning Workflows in Python Chapter3
No ratings yet
Designing Machine Learning Workflows in Python Chapter3
42 pages
Cleaning Data With PySpark Chapter4
No ratings yet
Cleaning Data With PySpark Chapter4
23 pages
Analyzing IoT Data in Python Chapter4
No ratings yet
Analyzing IoT Data in Python Chapter4
34 pages
Cleaning Data With PySpark Chapter1
0% (1)
Cleaning Data With PySpark Chapter1
20 pages
Cleaning Data With PySpark Chapter3
No ratings yet
Cleaning Data With PySpark Chapter3
25 pages
Cleaning Data With PySpark Chapter2
100% (1)
Cleaning Data With PySpark Chapter2
25 pages
Building Chatbots in Python Chapter4
No ratings yet
Building Chatbots in Python Chapter4
20 pages
Building Chatbots in Python Chapter2 PDF
No ratings yet
Building Chatbots in Python Chapter2 PDF
41 pages
Analyzing IoT Data in Python Chapter2
No ratings yet
Analyzing IoT Data in Python Chapter2
35 pages
Analyzing IoT Data in Python Chapter1
100% (1)
Analyzing IoT Data in Python Chapter1
27 pages
Analyzing IoT Data in Python Chapter3
No ratings yet
Analyzing IoT Data in Python Chapter3
30 pages
Business Plan Children's Boutique
No ratings yet
Business Plan Children's Boutique
9 pages
Fidic Red Yellow Silver Books
No ratings yet
Fidic Red Yellow Silver Books
11 pages
Espedillon - Case Study 1
No ratings yet
Espedillon - Case Study 1
1 page
Royal Wrist Watch
No ratings yet
Royal Wrist Watch
41 pages
71 Motijheel C/A Dhaka-1000 VAT Registration No: 001892527-0202 (VAT - 6.3)
No ratings yet
71 Motijheel C/A Dhaka-1000 VAT Registration No: 001892527-0202 (VAT - 6.3)
1 page
Section 1: The Challenge of Entrepreneurship
No ratings yet
Section 1: The Challenge of Entrepreneurship
45 pages
Airtel JD
No ratings yet
Airtel JD
7 pages
United States Court of Appeals, Second Circuit.: No. 574, Docket 77-6155
No ratings yet
United States Court of Appeals, Second Circuit.: No. 574, Docket 77-6155
11 pages
Business Spotlight I10 2023
100% (1)
Business Spotlight I10 2023
52 pages
Tities Catur Indah (21011206) Sani Ahsanul Q (21011206) Zanira Firdaus (21011206) Lilik Nurlika (21011206) Rea Desvita (21011206)
No ratings yet
Tities Catur Indah (21011206) Sani Ahsanul Q (21011206) Zanira Firdaus (21011206) Lilik Nurlika (21011206) Rea Desvita (21011206)
20 pages
E GovernanceinIndiaProspectsStatusandChallenges
No ratings yet
E GovernanceinIndiaProspectsStatusandChallenges
10 pages
A Study of The Use of Cashless Payments in Relation To Income, Financial Behavior, and Almsgiving Behavior in Sumatera, Indonesia
No ratings yet
A Study of The Use of Cashless Payments in Relation To Income, Financial Behavior, and Almsgiving Behavior in Sumatera, Indonesia
5 pages
BSNL Bill June 2025
No ratings yet
BSNL Bill June 2025
3 pages
Req Gathering
No ratings yet
Req Gathering
6 pages
Acc Cement Copy Project
No ratings yet
Acc Cement Copy Project
33 pages
071 Goutam Das Signed
No ratings yet
071 Goutam Das Signed
11 pages
Student's M&A
100% (1)
Student's M&A
61 pages
M1P1
No ratings yet
M1P1
69 pages
Doctrine of Limited Liability
No ratings yet
Doctrine of Limited Liability
46 pages
TitleForemost Department Store Inc
No ratings yet
TitleForemost Department Store Inc
10 pages
Types of Letter
No ratings yet
Types of Letter
5 pages
Estatement: Transaction Summary
No ratings yet
Estatement: Transaction Summary
1 page
Global Business Organization
No ratings yet
Global Business Organization
50 pages
MBA Interview Questions
No ratings yet
MBA Interview Questions
4 pages
Business Model of GOOGLE
No ratings yet
Business Model of GOOGLE
5 pages
CBSE Notes Class 10 Social Science History Chapter 1 - The Rise of Nationalism in Europe
No ratings yet
CBSE Notes Class 10 Social Science History Chapter 1 - The Rise of Nationalism in Europe
27 pages
Shipyaari Rates 6)
No ratings yet
Shipyaari Rates 6)
1 page
Mediclaim Health Checkup
No ratings yet
Mediclaim Health Checkup
1 page
Master Thesis Research Proposal Example
100% (3)
Master Thesis Research Proposal Example
5 pages
043 0001553911 12BA Anex Unlocked
No ratings yet
043 0001553911 12BA Anex Unlocked
3 pages