0% found this document useful (0 votes)
9 views16 pages

3 - Analysis of Default - Ipynb - Colab

The document is a Jupyter notebook analyzing a German credit dataset containing 5000 entries and 23 columns, focusing on customer credit behavior and default rates. It includes data loading, exploratory data analysis, and visualizations using the Plotnine library to examine various factors influencing payment defaults. The analysis highlights the need for proper data type handling and the identification of high event rates in specific categories.

Uploaded by

sambha7896
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views16 pages

3 - Analysis of Default - Ipynb - Colab

The document is a Jupyter notebook analyzing a German credit dataset containing 5000 entries and 23 columns, focusing on customer credit behavior and default rates. It includes data loading, exploratory data analysis, and visualizations using the Plotnine library to examine various factors influencing payment defaults. The analysis highlights the need for proper data type handling and the identification of high event rates in specific categories.

Uploaded by

sambha7896
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

1/4/25, 11:39 AM 3_Analysis of Default.

ipynb - Colab

import pandas as pd
import matplotlib.pyplot as plt

gc=pd.read_csv("/Users/nitinsaraswat/Documents/AON/decision trees/data/german_credit_data.csv")

gc.head()

Customer_ID Status_Checking_Acc Duration_in_Months Credit_History Purposre_Credit_Taken Credit_Amount Savings_Acc Years_At

0 100001 A11 6 A34 A43 1169 A65

1 100002 A12 48 A32 A43 5951 A61

2 100003 A14 12 A34 A46 2096 A61

3 100004 A11 42 A32 A42 7882 A61

4 100005 A11 24 A33 A40 4870 A61

5 rows × 23 columns

gc.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 23 columns):
Customer_ID 5000 non-null int64
Status_Checking_Acc 5000 non-null object
Duration_in_Months 5000 non-null int64
Credit_History 5000 non-null object
Purposre_Credit_Taken 5000 non-null object
Credit_Amount 5000 non-null int64
Savings_Acc 5000 non-null object
Years_At_Present_Employment 5000 non-null object
Inst_Rt_Income 5000 non-null int64
Marital_Status_Gender 5000 non-null object
Other_Debtors_Guarantors 5000 non-null object
Current_Address_Yrs 5000 non-null int64
Property 5000 non-null object
Age 5000 non-null int64
Other_Inst_Plans 5000 non-null object
Housing 5000 non-null object
Num_CC 5000 non-null int64
Job 5000 non-null object
Dependents 5000 non-null int64
Telephone 5000 non-null object
Foreign_Worker 5000 non-null object
Default_On_Payment 5000 non-null int64
Count 5000 non-null int64
dtypes: int64(10), object(13)
memory usage: 898.5+ KB

gc.describe()

Customer_ID Duration_in_Months Credit_Amount Inst_Rt_Income Current_Address_Yrs Age Num_CC Dependents D

count 5000.000000 5000.000000 5000.000000 5000.000000 5000.000000 5000.000000 5000.000000 5000.000000

mean 102500.500000 20.903000 3271.258000 2.973000 2.845000 35.546000 1.407000 1.155000

std 1443.520003 12.053989 2821.607329 1.118267 1.103276 11.370917 0.577423 0.361941

min 100001.000000 4.000000 250.000000 1.000000 1.000000 19.000000 1.000000 1.000000

25% 101250.750000 12.000000 1365.500000 2.000000 2.000000 27.000000 1.000000 1.000000

50% 102500.500000 18.000000 2319.500000 3.000000 3.000000 33.000000 1.000000 1.000000

75% 103750.250000 24.000000 3972.250000 4.000000 4.000000 42.000000 2.000000 1.000000

max 105000.000000 72.000000 18424.000000 4.000000 4.000000 75.000000 4.000000 2.000000

#Exploratory data analysis


!pip install plotnine

Requirement already satisfied: plotnine in /Users/nitinsaraswat/anaconda3/lib/python3.7/site-packages (0.5.1)


Requirement already satisfied: mizani>=0.5.2 in /Users/nitinsaraswat/anaconda3/lib/python3.7/site-packages (from plotnine) (0.5.3)
Requirement already satisfied: patsy>=0.4.1 in /Users/nitinsaraswat/anaconda3/lib/python3.7/site-packages (from plotnine) (0.5.0)
Requirement already satisfied: numpy in /Users/nitinsaraswat/anaconda3/lib/python3.7/site-packages (from plotnine) (1.17.2)
Requirement already satisfied: pandas>=0.23.4 in /Users/nitinsaraswat/anaconda3/lib/python3.7/site-packages (from plotnine) (0.23.4
Requirement already satisfied: statsmodels>=0.8.0 in /Users/nitinsaraswat/anaconda3/lib/python3.7/site-packages (from plotnine) (0.1
Requirement already satisfied: descartes>=1.1.0 in /Users/nitinsaraswat/anaconda3/lib/python3.7/site-packages (from plotnine) (1.1.0

https://colab.research.google.com/drive/1q4mVLgoQySfROIe0pubvF9IVBxM7XLLo#printMode=true 1/16
1/4/25, 11:39 AM 3_Analysis of Default.ipynb - Colab
Requirement already satisfied: scipy>=1.0.0 in /Users/nitinsaraswat/anaconda3/lib/python3.7/site-packages (from plotnine) (1.3.1)
Requirement already satisfied: matplotlib>=3.0.0 in /Users/nitinsaraswat/anaconda3/lib/python3.7/site-packages (from plotnine) (3.0
Requirement already satisfied: palettable in /Users/nitinsaraswat/anaconda3/lib/python3.7/site-packages (from mizani>=0.5.2->plotnin
Requirement already satisfied: six in /Users/nitinsaraswat/anaconda3/lib/python3.7/site-packages (from patsy>=0.4.1->plotnine) (1.11
Requirement already satisfied: python-dateutil>=2.5.0 in /Users/nitinsaraswat/anaconda3/lib/python3.7/site-packages (from pandas>=0
Requirement already satisfied: pytz>=2011k in /Users/nitinsaraswat/anaconda3/lib/python3.7/site-packages (from pandas>=0.23.4->plotn
Requirement already satisfied: cycler>=0.10 in /Users/nitinsaraswat/anaconda3/lib/python3.7/site-packages (from matplotlib>=3.0.0->p
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /Users/nitinsaraswat/anaconda3/lib/python3.7/site-package
Requirement already satisfied: kiwisolver>=1.0.1 in /Users/nitinsaraswat/anaconda3/lib/python3.7/site-packages (from matplotlib>=3.0
Requirement already satisfied: setuptools in /Users/nitinsaraswat/anaconda3/lib/python3.7/site-packages (from kiwisolver>=1.0.1->mat
WARNING: You are using pip version 19.3; however, version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

!pip install --upgrade pip

Collecting pip
Downloading https://files.pythonhosted.org/packages/d8/f3/413bab4ff08e1fc4828dfc59996d721917df8e8583ea85385d51125dceff/pip-19.0.3
100% |████████████████████████████████| 1.4MB 785kB/s ta 0:00:01
Installing collected packages: pip
Found existing installation: pip 19.0.2
Uninstalling pip-19.0.2:
Successfully uninstalled pip-19.0.2
Successfully installed pip-19.0.3

#Draw plot side by side

gc['Default_On_Payment'].value_counts().plot(kind="bar")

<matplotlib.axes._subplots.AxesSubplot at 0x102bdeb00>

import plotnine as pn

pd.read

(pn.ggplot(gc, pn.aes('Default_On_Payment')) + pn.geom_bar(pn.aes(fill = 'Status_Checking_Acc'),


position = "dodge")
)

#A11 event rate looks high

https://colab.research.google.com/drive/1q4mVLgoQySfROIe0pubvF9IVBxM7XLLo#printMode=true 2/16
1/4/25, 11:39 AM 3_Analysis of Default.ipynb - Colab
pd.__version__

'0.23.4'

(pn.ggplot(gc, pn.aes('Default_On_Payment')) + pn.geom_bar(pn.aes(fill = 'Credit_History'),


position = "dodge")
)

#A30 event rate looks high

(pn.ggplot(gc, pn.aes('Default_On_Payment')) + pn.geom_bar(pn.aes(fill = 'Purposre_Credit_Taken'),


position = "dodge")
)

#A40 event rate looks somewhat high

(pn.ggplot(gc, pn.aes('Default_On_Payment')) + pn.geom_bar(pn.aes(fill = 'Savings_Acc'),


position = "dodge")
)

#A61 event rate looks reasonable

https://colab.research.google.com/drive/1q4mVLgoQySfROIe0pubvF9IVBxM7XLLo#printMode=true 3/16
1/4/25, 11:39 AM 3_Analysis of Default.ipynb - Colab

(pn.ggplot(gc, pn.aes('Default_On_Payment')) + pn.geom_bar(pn.aes(fill = 'Years_At_Present_Employment'),


position = "dodge")
)

#The below graph does not interpret Inst_Rt_Income properly - What should be done

(pn.ggplot(gc, pn.aes('Default_On_Payment')) + pn.geom_bar(pn.aes(fill = 'Inst_Rt_Income'),


position = "dodge")
)

https://colab.research.google.com/drive/1q4mVLgoQySfROIe0pubvF9IVBxM7XLLo#printMode=true 4/16
1/4/25, 11:39 AM 3_Analysis of Default.ipynb - Colab

gc2=gc

gc2['Inst_Rt_Income']=gc2['Inst_Rt_Income'].apply(str)
gc2['Current_Address_Yrs']=gc2['Current_Address_Yrs'].apply(str)
gc2['Dependents']=gc2['Dependents'].apply(str)

#Do this for other such variables also - Num_CC

gc2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 23 columns):
Customer_ID 5000 non-null int64
Status_Checking_Acc 5000 non-null object
Duration_in_Months 5000 non-null int64
Credit_History 5000 non-null object
Purposre_Credit_Taken 5000 non-null object
Credit_Amount 5000 non-null int64
Savings_Acc 5000 non-null object
Years_At_Present_Employment 5000 non-null object
Inst_Rt_Income 5000 non-null object
Marital_Status_Gender 5000 non-null object
Other_Debtors_Guarantors 5000 non-null object
Current_Address_Yrs 5000 non-null object
Property 5000 non-null object
Age 5000 non-null int64
Other_Inst_Plans 5000 non-null object
Housing 5000 non-null object
Num_CC 5000 non-null int64
Job 5000 non-null object
Dependents 5000 non-null object
Telephone 5000 non-null object
Foreign_Worker 5000 non-null object
Default_On_Payment 5000 non-null int64
Count 5000 non-null int64
dtypes: int64(7), object(16)
memory usage: 898.5+ KB

(pn.ggplot(gc2, pn.aes('Default_On_Payment')) + pn.geom_bar(pn.aes(fill = 'Inst_Rt_Income'),


position = "dodge")
)

https://colab.research.google.com/drive/1q4mVLgoQySfROIe0pubvF9IVBxM7XLLo#printMode=true 5/16
1/4/25, 11:39 AM 3_Analysis of Default.ipynb - Colab

(pn.ggplot(gc, pn.aes('Default_On_Payment')) + pn.geom_bar(pn.aes(fill = 'Marital_Status_Gender'),


position = "dodge")
)

#Same problem as Inst_Rt_Income

(pn.ggplot(gc2, pn.aes('Default_On_Payment')) + pn.geom_bar(pn.aes(fill = 'Current_Address_Yrs'),


position = "dodge")
)

https://colab.research.google.com/drive/1q4mVLgoQySfROIe0pubvF9IVBxM7XLLo#printMode=true 6/16
1/4/25, 11:39 AM 3_Analysis of Default.ipynb - Colab

(pn.ggplot(gc, pn.aes('Default_On_Payment')) + pn.geom_bar(pn.aes(fill = 'Property'),


position = "dodge")
)

(pn.ggplot(gc, pn.aes('Default_On_Payment')) + pn.geom_bar(pn.aes(fill = 'Other_Inst_Plans '),


position = "dodge")
)
#There is a space in Other_Inst_Plans variable name

https://colab.research.google.com/drive/1q4mVLgoQySfROIe0pubvF9IVBxM7XLLo#printMode=true 7/16
1/4/25, 11:39 AM 3_Analysis of Default.ipynb - Colab

(pn.ggplot(gc, pn.aes('Default_On_Payment')) + pn.geom_bar(pn.aes(fill = 'Housing'),


position = "dodge")
)
#There is a space in Other_Inst_Plans variable name

#gc2=gc

#gc2['Num_CC']=gc2['Num_CC'].apply(str)

(pn.ggplot(gc2, pn.aes('Default_On_Payment')) + pn.geom_bar(pn.aes(fill = 'Num_CC'),


position = "dodge")
)
#There is a space in Other_Inst_Plans variable name

https://colab.research.google.com/drive/1q4mVLgoQySfROIe0pubvF9IVBxM7XLLo#printMode=true 8/16
1/4/25, 11:39 AM 3_Analysis of Default.ipynb - Colab

(pn.ggplot(gc2, pn.aes('Default_On_Payment')) + pn.geom_bar(pn.aes(fill = 'Job'),


position = "dodge")
)
#There is a space in Other_Inst_Plans variable name

(pn.ggplot(gc2, pn.aes('Default_On_Payment')) + pn.geom_bar(pn.aes(fill = 'Dependents'),


position = "dodge")
)
#There is a space in Other_Inst_Plans variable name

https://colab.research.google.com/drive/1q4mVLgoQySfROIe0pubvF9IVBxM7XLLo#printMode=true 9/16
1/4/25, 11:39 AM 3_Analysis of Default.ipynb - Colab

(pn.ggplot(gc, pn.aes('Default_On_Payment')) + pn.geom_bar(pn.aes(fill = 'Telephone'),


position = "dodge")
)
#There is a space in Other_Inst_Plans variable name

(pn.ggplot(gc, pn.aes('Default_On_Payment')) + pn.geom_bar(pn.aes(fill = 'Foreign_Worker'),


position = "dodge")
)
#There is a space in Other_Inst_Plans variable name

https://colab.research.google.com/drive/1q4mVLgoQySfROIe0pubvF9IVBxM7XLLo#printMode=true 10/16
1/4/25, 11:39 AM 3_Analysis of Default.ipynb - Colab

plt.rcParams['figure.figsize'] = [10, 7]

#Box plot of Credit Amount

gc.boxplot(column=['Credit_Amount'], by=['Default_On_Payment'])

<matplotlib.axes._subplots.AxesSubplot at 0x120c29c18>

from IPython.display import Image

Image("/Users/nitinsaraswat/Desktop/bp.png")

https://colab.research.google.com/drive/1q4mVLgoQySfROIe0pubvF9IVBxM7XLLo#printMode=true 11/16
1/4/25, 11:39 AM 3_Analysis of Default.ipynb - Colab

## The top line is maximum fence value excluding outliers

## The bottom line is minimum fence value excluding outliers

## If there are points below bottom line then they are outliers (less than 3/2 times of 25 percentile value)

## If there are points above top line then they are outliers (more than 3/2 times of 75% percentile value)

gc.boxplot(column=['Duration_in_Months'], by=['Default_On_Payment'])

<matplotlib.axes._subplots.AxesSubplot at 0x120c53588>

from IPython.display import Image


Image(url='/Users/nitinsaraswat/Documents/AON/box-plot-explained.gif', width=400, height=400)

https://colab.research.google.com/drive/1q4mVLgoQySfROIe0pubvF9IVBxM7XLLo#printMode=true 12/16
1/4/25, 11:39 AM 3_Analysis of Default.ipynb - Colab

#Prepare the data for train test split

X=gc.drop(columns=['Customer_ID','Default_On_Payment'],axis=1)
y=gc['Default_On_Payment']

X.head()

Status_Checking_Acc Duration_in_Months Credit_History Purposre_Credit_Taken Credit_Amount Savings_Acc Years_At_Present_Empl

0 A11 6 A34 A43 1169 A65

1 A12 48 A32 A43 5951 A61

2 A14 12 A34 A46 2096 A61

3 A11 42 A32 A42 7882 A61

4 A11 24 A33 A40 4870 A61

5 rows × 21 columns

y.head()

0 0
1 0
2 0
3 0
4 1
Name: Default_On_Payment, dtype: int64

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

X_train.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3750 entries, 3186 to 235
Data columns (total 21 columns):
Status_Checking_Acc 3750 non-null object
Duration_in_Months 3750 non-null int64
Credit_History 3750 non-null object
Purposre_Credit_Taken 3750 non-null object
Credit_Amount 3750 non-null int64
Savings_Acc 3750 non-null object
Years_At_Present_Employment 3750 non-null object
Inst_Rt_Income 3750 non-null object
Marital_Status_Gender 3750 non-null object
Other_Debtors_Guarantors 3750 non-null object
Current_Address_Yrs 3750 non-null object
Property 3750 non-null object
Age 3750 non-null int64
Other_Inst_Plans 3750 non-null object
Housing 3750 non-null object
Num_CC 3750 non-null int64
Job 3750 non-null object
Dependents 3750 non-null object
Telephone 3750 non-null object
Foreign_Worker 3750 non-null object

https://colab.research.google.com/drive/1q4mVLgoQySfROIe0pubvF9IVBxM7XLLo#printMode=true 13/16
1/4/25, 11:39 AM 3_Analysis of Default.ipynb - Colab
Count 3750 non-null int64
dtypes: int64(5), object(16)
memory usage: 644.5+ KB

from sklearn import tree

model = tree.DecisionTreeClassifier()

model

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,


max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort=False, random_state=None,
splitter='best')

model.fit(X_train, y_train)

---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-51-d768f88d541e> in <module>()
----> 1 model.fit(X_train, y_train)

2 frames
~/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, dtype, order, copy,
force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
431 force_all_finite)
432 else:
--> 433 array = np.array(array, dtype=dtype, order=order, copy=copy)
434
435 if ensure_2d:

ValueError: could not convert string to float: 'A12'

gc4=pd.get_dummies(gc)

#The above error shows that Decision Tree Algorithm expects the categorical variables to be properly encoded
#Create categorical variables using get_dummies

gc3=pd.get_dummies(gc,columns=['Status_Checking_Acc','Credit_History','Purposre_Credit_Taken','Savings_Acc', \
'Years_At_Present_Employment','Marital_Status_Gender','Other_Debtors_Guarantors', \
'Property','Other_Inst_Plans ','Housing','Job','Telephone','Foreign_Worker',
'Inst_Rt_Income','Num_CC','Dependents','Current_Address_Yrs'],drop_first=True)

gc3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 57 columns):
Customer_ID 5000 non-null int64
Duration_in_Months 5000 non-null int64
Credit_Amount 5000 non-null int64
Age 5000 non-null int64
Default_On_Payment 5000 non-null int64
Count 5000 non-null int64
Status_Checking_Acc_A12 5000 non-null uint8
Status_Checking_Acc_A13 5000 non-null uint8
Status_Checking_Acc_A14 5000 non-null uint8
Credit_History_A31 5000 non-null uint8
Credit_History_A32 5000 non-null uint8
Credit_History_A33 5000 non-null uint8
Credit_History_A34 5000 non-null uint8
Purposre_Credit_Taken_A41 5000 non-null uint8
Purposre_Credit_Taken_A410 5000 non-null uint8
Purposre_Credit_Taken_A42 5000 non-null uint8
Purposre_Credit_Taken_A43 5000 non-null uint8
Purposre_Credit_Taken_A44 5000 non-null uint8
Purposre_Credit_Taken_A45 5000 non-null uint8
Purposre_Credit_Taken_A46 5000 non-null uint8
Purposre_Credit_Taken_A48 5000 non-null uint8
Purposre_Credit_Taken_A49 5000 non-null uint8
Savings_Acc_A62 5000 non-null uint8
Savings_Acc_A63 5000 non-null uint8
Savings_Acc_A64 5000 non-null uint8
Savings_Acc_A65 5000 non-null uint8
Years_At_Present_Employment_A72 5000 non-null uint8
Years_At_Present_Employment_A73 5000 non-null uint8
Years_At_Present_Employment_A74 5000 non-null uint8
Years_At_Present_Employment_A75 5000 non-null uint8
Marital_Status_Gender_A92 5000 non-null uint8
Marital_Status_Gender_A93 5000 non-null uint8

https://colab.research.google.com/drive/1q4mVLgoQySfROIe0pubvF9IVBxM7XLLo#printMode=true 14/16
1/4/25, 11:39 AM 3_Analysis of Default.ipynb - Colab
Marital_Status_Gender_A94 5000 non-null uint8
Other_Debtors_Guarantors_A102 5000 non-null uint8
Other_Debtors_Guarantors_A103 5000 non-null uint8
Property_A122 5000 non-null uint8
Property_A123 5000 non-null uint8
Property_A124 5000 non-null uint8
Other_Inst_Plans _A142 5000 non-null uint8
Other_Inst_Plans _A143 5000 non-null uint8
Housing_A152 5000 non-null uint8
Housing_A153 5000 non-null uint8
Job_A172 5000 non-null uint8
Job_A173 5000 non-null uint8
Job_A174 5000 non-null uint8
Telephone_A192 5000 non-null uint8
Foreign_Worker_A202 5000 non-null uint8
Inst_Rt_Income_2 5000 non-null uint8
Inst_Rt_Income_3 5000 non-null uint8
Inst_Rt_Income_4 5000 non-null uint8
Num_CC_2 5000 non-null uint8
Num_CC_3 5000 non-null uint8
Num_CC_4 5000 non-null uint8
Dependents_2 5000 non-null uint8
Current Address Yrs 2 5000 non-null uint8

#Prepare the data for train test split

X=gc3.drop(columns=['Customer_ID','Default_On_Payment'],axis=1)
y=gc3['Default_On_Payment']

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

from sklearn import tree

model = tree.DecisionTreeClassifier(max_depth=10,max_features=7)

#Try different variations of these parameters and see how it works

model.fit(X_train, y_train)

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=10,


max_features=7, max_leaf_nodes=None, min_impurity_decrease=0.0,
min_impurity_split=None, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
presort=False, random_state=None, splitter='best')

model1=model.fit(X_train, y_train)

y_predict = model.predict(X_test)

from sklearn.metrics import accuracy_score

accuracy_score(y_test, y_predict)

0.82

from sklearn.metrics import confusion_matrix

pd.DataFrame(
confusion_matrix(y_test, y_predict),
columns=['Predicted Default', 'Predicted Non-Default'],
index=['Actual Default', 'Actual Non-Default']
)

Predicted Default Predicted Non-Default

Actual Default 803 98

Actual Non-Default 127 222

from sklearn.metrics import roc_curve, auc


false_positive_rate, true_positive_rate, thresholds = roc_curve(y_test, y_predict)
roc_auc = auc(false_positive_rate, true_positive_rate)

roc_auc

https://colab.research.google.com/drive/1q4mVLgoQySfROIe0pubvF9IVBxM7XLLo#printMode=true 15/16
1/4/25, 11:39 AM 3_Analysis of Default.ipynb - Colab

0.7636675581731855
y_true=y_test

y_probas=model.predict_proba(X_test)

!pip install scikit-plot

Collecting scikit-plot
Downloading https://files.pythonhosted.org/packages/7c/47/32520e259340c140a4ad27c1b97050dd3254fdc517b1d59974d47037510e/scikit_plot
Requirement already satisfied: scipy>=0.9 in /Users/nitinsaraswat/anaconda3/lib/python3.7/site-packages (from scikit-plot) (1.1.0)
Requirement already satisfied: matplotlib>=1.4.0 in /Users/nitinsaraswat/anaconda3/lib/python3.7/site-packages (from scikit-plot) (3
Requirement already satisfied: scikit-learn>=0.18 in /Users/nitinsaraswat/anaconda3/lib/python3.7/site-packages (from scikit-plot)
Collecting joblib>=0.10 (from scikit-plot)
Downloading https://files.pythonhosted.org/packages/cd/c1/50a758e8247561e58cb87305b1e90b171b8c767b15b12a1734001f41d356/joblib-0.13
100% |████████████████████████████████| 286kB 489kB/s ta 0:00:01
Requirement already satisfied: kiwisolver>=1.0.1 in /Users/nitinsaraswat/anaconda3/lib/python3.7/site-packages (from matplotlib>=1.4
Requirement already satisfied: python-dateutil>=2.1 in /Users/nitinsaraswat/anaconda3/lib/python3.7/site-packages (from matplotlib>=
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /Users/nitinsaraswat/anaconda3/lib/python3.7/site-package
Requirement already satisfied: cycler>=0.10 in /Users/nitinsaraswat/anaconda3/lib/python3.7/site-packages (from matplotlib>=1.4.0->s
Requirement already satisfied: numpy>=1.10.0 in /Users/nitinsaraswat/anaconda3/lib/python3.7/site-packages (from matplotlib>=1.4.0->
Requirement already satisfied: setuptools in /Users/nitinsaraswat/anaconda3/lib/python3.7/site-packages (from kiwisolver>=1.0.1->mat
Requirement already satisfied: six>=1.5 in /Users/nitinsaraswat/anaconda3/lib/python3.7/site-packages (from python-dateutil>=2.1->ma
Installing collected packages: joblib, scikit-plot
Successfully installed joblib-0.13.2 scikit-plot-0.3.7

import scikitplot as skplt

skplt.metrics.plot_roc(y_true, y_probas)
plt.show()

https://colab.research.google.com/drive/1q4mVLgoQySfROIe0pubvF9IVBxM7XLLo#printMode=true 16/16

You might also like