Credit Risk Modeling in Python Chapter2
Credit Risk Modeling in Python Chapter2
for probability of
default
CREDIT RIS K MODELIN G IN P YTH ON
Michael Crabtree
Data Scientist, Ford Motor Company
Probability of default
The likelihood that someone will default on a loan is the probability of default
Decision tree
clf_logistic = LogisticRegression(solver='lbfgs')
clf_logistic.fit(training_columns, np.ravel(training_labels))
X = cr_loan.drop('loan_status', axis = 1)
y = cr_loan[['loan_status']]
Michael Crabtree
Data Scientist, Ford Motor Company
Logistic regression coef cients
# Model Intercept
array([-3.30582292e-10])
# Coefficients for ['loan_int_rate','person_emp_length','person_income']
array([[ 1.28517496e-09, -2.27622202e-09, -2.17211991e-05]])
For every 1 year increase in person_emp_length , the person is less likely to default
For every 1 year increase in person_emp_length , the person is less likely to default
Non-numeric:
cr_loan_clean['loan_intent']
EDUCATION
MEDICAL
VENTURE
PERSONAL
DEBTCONSOLIDATION
HOMEIMPROVEMENT
Will cause errors with machine learning models in Python unless processed
Michael Crabtree
Data Scientist, Ford Motor Company
Model accuracy scoring
Calculate accuracy
0.81
preds = clf_logistic.predict_proba(X_test)
preds_df = pd.DataFrame(preds[:,1], columns = ['prob_default'])
preds_df['loan_status'] = preds_df['prob_default'].apply(lambda x: 1 if x > 0.5 else 0)
Michael Crabtree
Data Scientist, Ford Motor Company
Confusion matrices
Shows the number of correct and incorrect predictions for each loan_status