0% found this document useful (0 votes)

47 views5 pages

CART+ +Loan+Delinquent+ +Student+File+0.1 - New - Ipynb Colaboratory

The document discusses building a decision tree model to predict loan delinquency using CART technique. It loads and cleans loan data, encodes categorical variables, splits the data into training and test sets, builds a decision tree classifier, tunes hyperparameters, evaluates model performance using metrics like AUC, accuracy, and confusion matrices. Key findings are the model achieves similar high accuracy on both training and test sets, with FICO, term, and gender being the most important predictors of delinquency.

Uploaded by

SHEKHAR SWAMI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views5 pages

CART+ +Loan+Delinquent+ +Student+File+0.1 - New - Ipynb Colaboratory

Uploaded by

SHEKHAR SWAMI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Problem Statement

Based on the given loan data can we understand the major factors or characteristics of a borrower which makes them to get into delinquent
stage.

• Delinquency is a major metric in assessing risk as more and more customers getting delinquent means the risk of customers that will default
will also increase.

• The main objective is to minimize the risk for which you need to build a decision tree model using CART technique that will identify various risk
and non-risk attributes of borrower’s to get into delinquent stage

Importing libraries and Loading data

import numpy as np
import pandas as pd
from sklearn.tree import DecisionTreeClassifier

ld_df = pd.read_csv("Loan Delinquent Dataset.csv")

Checking the data

ld_df.head()

Dropping unwanted variables

Sdelinquent can also be dropped instead of delinquent.

ld_df=ld_df.drop(["ID","delinquent"],axis=1)

ld_df.head()

ld_df.shape

ld_df.info()

many columns are of type object i.e. strings. These need to be converted to ordinal type

Geting unique counts of all Objects

print('term \n',ld_df.term.value_counts())
print('\n')
print('gender \n',ld_df.gender.value_counts())
print('\n')
print('purpose \n',ld_df.purpose.value_counts())
print('\n')
print('home_ownership \n',ld_df.home_ownership.value_counts())
print('\n')
print('age \n',ld_df.age.value_counts())
print('\n')
print('FICO \n',ld_df.FICO.value_counts())

Note:
Decision tree in Python can take only numerical / categorical colums. It cannot take string / object types.

The following code loops through each column and checks if the column type is object then converts those columns into categorical with each
distinct value becoming a category.

for feature in ld_df.columns:
    if ld_df[feature].dtype == 'object':
        print('\n')
        print('feature:',feature)
        print(pd.Categorical(ld_df[feature].unique()))
        print(pd.Categorical(ld_df[feature].unique()).codes)
        ld_df[feature] = pd.Categorical(ld_df[feature]).codes

For each feature, look at the 2nd and 4th row to get the encoding mappings. Do not look at the line starting with 'Categories'

Comparing the unique counts from above

ld_df.info()

ld_df.head()

Label Encoding has been done and all columns are converted to number

Proportion of 1s and 0s

ld_df.Sdelinquent.value_counts(normalize=True)

print(ld_df.Sdelinquent.value_counts())
print('%1s',7721/(7721+3827))
print('%0s',3827/(7721+3827))

Extracting the target column into separate vectors for training set and test set

X = ld_df.drop("Sdelinquent", axis=1)

y = ld_df.pop("Sdelinquent")

X.head()

Splitting data into training and test set

from sklearn.model_selection import train_test_split

X_train, X_test, train_labels, test_labels = train_test_split(X, y, test_size=.30, random_state=1)

Checking the dimensions of the training and test data

print('X_train',X_train.shape)
print('X_test',X_test.shape)
print('train_labels',train_labels.shape)
print('test_labels',test_labels.shape)
print('Total Obs',8083+3465)

Building a Decision Tree Classifier

# Initialise a Decision Tree Classifier
# Fit the model

from sklearn import tree

train_char_label = ['No', 'Yes']
ld_Tree_File = open('ld_Tree_File.dot','w')
dot_data = tree.export_graphviz(dt_model,
                                out_file=ld_Tree_File,
                                feature_names = list(X_train),
                                class_names = list(train_char_label))

ld_Tree_File.close()

The above code will save a .dot file in your working directory.
WebGraphviz is Graphviz in the Browser.
Copy paste the contents of the file into the link below to get the visualization
http://webgraphviz.com/

Variable Importance

print (pd.DataFrame(dt_model.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values('Imp',ascending=False))

Predicting Test Data

y_predict.shape

Regularising the Decision Tree

Adding Tuning Parameters

reg_dt_model = DecisionTreeClassifier(criterion = 'gini', max_depth = 30,min_samples_leaf=100,min_samples_split=1000, random_state=1)
reg_dt_model.fit(X_train, train_labels)

Generating New Tree

ld_tree_regularized = open('ld_tree_regularized.dot','w')
dot_data = tree.export_graphviz(reg_dt_model, out_file= ld_tree_regularized , feature_names = list(X_train), class_names = list(train_cha

ld_tree_regularized.close()
dot_data

Variable Importance

Predicting on Training and Test dataset

# Complete the below code
ytrain_predict =
ytest_predict =
print('ytrain_predict',ytrain_predict.shape)
print('ytest_predict',ytest_predict.shape)

Getting the Predicted Classes

ytest_predict

Getting the Predicted Probabilities

ytest_predict_prob=reg_dt_model.predict_proba(X_test)
ytest_predict_prob

pd.DataFrame(ytest_predict_prob).head()

Model Evaluation

Measuring AUC-ROC Curve

import matplotlib.pyplot as plt

AUC and ROC for the training data

# predict probabilities
probs = reg_dt_model.predict_proba(X_train)
# keep probabilities for the positive outcome only
probs = probs[:, 1]
# calculate AUC
from sklearn.metrics import roc_auc_score
auc = roc_auc_score(train_labels, probs)
print('AUC: %.3f' % auc)
# calculate roc curve
from sklearn.metrics import roc_curve
fpr, tpr, thresholds = roc_curve(train_labels, probs)
plt.plot([0, 1], [0, 1], linestyle='--')
# plot the roc curve for the model
plt.plot(fpr, tpr, marker='.')
# show the plot
plt.show()

AUC and ROC for the test data

# predict probabilities

# keep probabilities for the positive outcome only

# calculate AUC

# calculate roc curve

# plot the roc curve for the model

# show the plot

Confusion Matrix for the training data

from sklearn.metrics import classification_report,confusion_matrix
#Train Data Accuracy
reg_dt_model.score(X_train,train_labels)

print((1985+4742)/(1985+650+706+4742))

print(classification_report(train_labels, ytrain_predict))

Confusion Matrix for test data

confusion_matrix(test_labels, ytest_predict)

#Test Data Accuracy
reg_dt_model.score(X_test,test_labels)

print((922+1941)/(922+270+332+1941))

print(classification_report(test_labels, ytest_predict))

Conclusion

Accuracy on the Training Data: 83%

Accuracy on the Test Data: 82%

AUC on the Training Data: 87.9%

AUC on the Test: 88.1%

Accuracy, AUC, Precision and Recall for test data is almost inline with training data. This proves no overfitting or underfitting has happened, and
overall the model is a good model for classification

Also,here analysing the metric recall is more important because, we don't want to miss out on those customers who are likely to delinquent,
having a predictive power to catch the delinquincies would help the banks to be more proactive in their approach, from the confusion matrix of
test data we can see that our model has miss calssified 332(False Negatives) customers as non delinquent but infact they are delinquent.

FICO, term and gender (in same order of preference) are the most important variables in determining if a borrower will get into a delinquent
stage

IPGScan Software User Guide (DOCOXUGGUIXX0001 Rev. B, January 2022)
No ratings yet
IPGScan Software User Guide (DOCOXUGGUIXX0001 Rev. B, January 2022)
306 pages
100+ ChatGPT Prompts For Software Developers - by Aruva - Empowering Ideas - Medium
100% (2)
100+ ChatGPT Prompts For Software Developers - by Aruva - Empowering Ideas - Medium
20 pages
Ai Merge All Slides'
No ratings yet
Ai Merge All Slides'
314 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
Decision Tree - Jupyter Notebook
No ratings yet
Decision Tree - Jupyter Notebook
4 pages
Machine Learning
No ratings yet
Machine Learning
12 pages
Title: Implementation of Decision Tree Classification: Department of Computer Science and Engineering
No ratings yet
Title: Implementation of Decision Tree Classification: Department of Computer Science and Engineering
8 pages
Lecture 7.2 - DTC Algorithm Implementation
No ratings yet
Lecture 7.2 - DTC Algorithm Implementation
7 pages
DT RF
No ratings yet
DT RF
7 pages
MLA Lab 6:-Implementation of Decision Tree
No ratings yet
MLA Lab 6:-Implementation of Decision Tree
16 pages
5b Python Implementation of Decision Tree
No ratings yet
5b Python Implementation of Decision Tree
7 pages
Decision TREE
100% (1)
Decision TREE
3 pages
Project 1
No ratings yet
Project 1
4 pages
Model Engineering
No ratings yet
Model Engineering
7 pages
Practical No4 - 5 ML
No ratings yet
Practical No4 - 5 ML
11 pages
Practice 2+
No ratings yet
Practice 2+
25 pages
Notes 221104 101858
No ratings yet
Notes 221104 101858
32 pages
Practice Test
No ratings yet
Practice Test
12 pages
Da Lab Mannual
No ratings yet
Da Lab Mannual
25 pages
Practical 15 Python
No ratings yet
Practical 15 Python
6 pages
Random Forest
No ratings yet
Random Forest
11 pages
Unit Iii Machine Learning
No ratings yet
Unit Iii Machine Learning
19 pages
Anaconda Ex-7
No ratings yet
Anaconda Ex-7
3 pages
Jupyter Lab
No ratings yet
Jupyter Lab
42 pages
Interpretable Machine Learning
No ratings yet
Interpretable Machine Learning
7 pages
Is Lab Aman Agarwal PDF
No ratings yet
Is Lab Aman Agarwal PDF
8 pages
ML Assignment 5
No ratings yet
ML Assignment 5
8 pages
Week 7 Laboratory Activity
No ratings yet
Week 7 Laboratory Activity
12 pages
AIH Lab2
No ratings yet
AIH Lab2
10 pages
1.10. Decision Trees - Scikit-Learn 0.24.1 Documentation
No ratings yet
1.10. Decision Trees - Scikit-Learn 0.24.1 Documentation
10 pages
Module 4 - Supervised Learning - First ML Model
No ratings yet
Module 4 - Supervised Learning - First ML Model
23 pages
Project Report - ML
100% (1)
Project Report - ML
17 pages
ML Lab Record2
No ratings yet
ML Lab Record2
42 pages
PA v0.25
No ratings yet
PA v0.25
18 pages
Experiment 8
No ratings yet
Experiment 8
14 pages
S6 - Data Mining Lab Experiments (Except 1)
No ratings yet
S6 - Data Mining Lab Experiments (Except 1)
6 pages
Kritika Sejwal 24MCI10023 ML Lab Project Report
No ratings yet
Kritika Sejwal 24MCI10023 ML Lab Project Report
10 pages
CAP3770 Lab#4 DecsionTree Sp2017
No ratings yet
CAP3770 Lab#4 DecsionTree Sp2017
4 pages
Classification Problems
No ratings yet
Classification Problems
53 pages
Lab 3
No ratings yet
Lab 3
6 pages
MlLabManualdocx 2024 09 04 22 02 58
No ratings yet
MlLabManualdocx 2024 09 04 22 02 58
19 pages
Minor Project
No ratings yet
Minor Project
21 pages
Department of Electronics & Telecommunications Engineering: ETEL71A-Machine Learning and AI
No ratings yet
Department of Electronics & Telecommunications Engineering: ETEL71A-Machine Learning and AI
4 pages
DM Lab 04
No ratings yet
DM Lab 04
6 pages
4.1.3.5 Lab - Decision Tree Classification
No ratings yet
4.1.3.5 Lab - Decision Tree Classification
11 pages
Soft Computing Lab Practical Assignment 2
No ratings yet
Soft Computing Lab Practical Assignment 2
10 pages
Machine Learning Lab: Delhi Technological University
No ratings yet
Machine Learning Lab: Delhi Technological University
6 pages
Python Implementation of Random Forest Algorithm
No ratings yet
Python Implementation of Random Forest Algorithm
10 pages
Experiment No 4 Vanraj
No ratings yet
Experiment No 4 Vanraj
2 pages
Random Forest: The Algorithm in A Nutshell
No ratings yet
Random Forest: The Algorithm in A Nutshell
10 pages
19 - Decision Tree - ID3
No ratings yet
19 - Decision Tree - ID3
87 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
Types of Pruning Techniques
No ratings yet
Types of Pruning Techniques
10 pages
2021BCS0103
No ratings yet
2021BCS0103
7 pages
CSE455/CSE552 Machine Learning (Spring 2024) Homework #2: Hand-In Policy Collaboration Policy Grading
No ratings yet
CSE455/CSE552 Machine Learning (Spring 2024) Homework #2: Hand-In Policy Collaboration Policy Grading
2 pages
Experiment 8 ML Vtu
No ratings yet
Experiment 8 ML Vtu
4 pages
Decision Tree Final
No ratings yet
Decision Tree Final
2 pages
CS326 Report
No ratings yet
CS326 Report
36 pages
Program
No ratings yet
Program
2 pages
Decision Trees
67% (3)
Decision Trees
14 pages
Micro Controllers I
No ratings yet
Micro Controllers I
18 pages
Testbank of The Textbook Csi3131 - Annotated
No ratings yet
Testbank of The Textbook Csi3131 - Annotated
186 pages
Pointers Examples: Page 1 of 7
No ratings yet
Pointers Examples: Page 1 of 7
7 pages
(0002) EWWD380-C11BJYNN - OM - D-7.001-07 - 02A-EN - Operation Manuals - English
No ratings yet
(0002) EWWD380-C11BJYNN - OM - D-7.001-07 - 02A-EN - Operation Manuals - English
36 pages
Questionnaire On Information and Communi
No ratings yet
Questionnaire On Information and Communi
7 pages
Cerc Ata 100 Raid Users Eng A01
No ratings yet
Cerc Ata 100 Raid Users Eng A01
56 pages
Documentum Support Specialist - JD
No ratings yet
Documentum Support Specialist - JD
2 pages
Agile
No ratings yet
Agile
288 pages
Fortios v5.2.13 Release Notes
No ratings yet
Fortios v5.2.13 Release Notes
24 pages
OOPs New-Program List
No ratings yet
OOPs New-Program List
2 pages
Se MCQ 1,2,3
No ratings yet
Se MCQ 1,2,3
51 pages
TravelMate P215-54 Specifications Revised
No ratings yet
TravelMate P215-54 Specifications Revised
7 pages
Portfolio Romina Castro LQ
No ratings yet
Portfolio Romina Castro LQ
16 pages
J2EE BSCIT 5 Material
No ratings yet
J2EE BSCIT 5 Material
136 pages
VDC Questionnaire RDC2 - JDC - v2.1.1 - Customer Version Telenoc
No ratings yet
VDC Questionnaire RDC2 - JDC - v2.1.1 - Customer Version Telenoc
17 pages
C-DAC's Common Admission Test (C-CAT) C-CAT Online Test Series
No ratings yet
C-DAC's Common Admission Test (C-CAT) C-CAT Online Test Series
4 pages
Tata Project
No ratings yet
Tata Project
1 page
Saas Management: The Definitive Guide For It and Finance Leaders
No ratings yet
Saas Management: The Definitive Guide For It and Finance Leaders
21 pages
Final Research Project Vibhuti Bhushan
No ratings yet
Final Research Project Vibhuti Bhushan
100 pages
Lavanya Pasupuleti 1
100% (1)
Lavanya Pasupuleti 1
3 pages
FEWSDOC Configuration Guide
No ratings yet
FEWSDOC Configuration Guide
839 pages
LEAD MANAGEMENT SCREEN IMAGE - Product Analyst Assignment
0% (1)
LEAD MANAGEMENT SCREEN IMAGE - Product Analyst Assignment
3 pages
Abhay Chaturvedi (Team Lead) : Abdul Noman Ansari Harshit
No ratings yet
Abhay Chaturvedi (Team Lead) : Abdul Noman Ansari Harshit
17 pages
System Pilot
No ratings yet
System Pilot
10 pages
InfoQ - Java8 PDF
No ratings yet
InfoQ - Java8 PDF
46 pages
Model Bank R13: TWS (EE) Setup Guide
No ratings yet
Model Bank R13: TWS (EE) Setup Guide
16 pages
Pof Id
No ratings yet
Pof Id
13 pages
Stepper Motor & Drivers - USB MACH3 4 Axis Controller - PENTING!
100% (1)
Stepper Motor & Drivers - USB MACH3 4 Axis Controller - PENTING!
6 pages

CART+ +Loan+Delinquent+ +Student+File+0.1 - New - Ipynb Colaboratory

Uploaded by

CART+ +Loan+Delinquent+ +Student+File+0.1 - New - Ipynb Colaboratory

Uploaded by

Problem Statement

Importing libraries and Loading data

Checking the data

Dropping unwanted variables

Sdelinquent can also be dropped instead of delinquent.

Geting unique counts of all Objects

Comparing the unique counts from above

Splitting data into training and test set

Checking the dimensions of the training and test data

Building a Decision Tree Classifier

Predicting Test Data

Regularising the Decision Tree

Adding Tuning Parameters

Generating New Tree

Predicting on Training and Test dataset

Getting the Predicted Classes

Getting the Predicted Probabilities

Measuring AUC-ROC Curve

AUC and ROC for the training data

AUC and ROC for the test data

Confusion Matrix for the training data

Confusion Matrix for test data

Accuracy on the Training Data: 83%

AUC on the Training Data: 87.9%

You might also like