0% found this document useful (0 votes)

10 views17 pages

Credit - Defaulters - Prediction Using Logostic Regression

Uploaded by

ahamedyunusasuss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views17 pages

Credit - Defaulters - Prediction Using Logostic Regression

Uploaded by

ahamedyunusasuss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

diction-using-logostic-regression

September 22, 2024

#Importing Libraries
[5]: import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

[6]: import seaborn as sns

#Load Dataset
[7]: df = pd.read_csv('german_credit_data.csv')

The selected attributes are:

Age (numeric) Sex (text: male, female) Job (numeric: 0 - unskilled and non-resident, 1 - unskilled
and resident, 2 - skilled, 3 - highly skilled) Housing (text: own, rent, or free) Saving accounts (text
- little, moderate, quite rich, rich) Checking account (numeric, in DM - Deutsch Mark) Credit
amount (numeric, in DM) Duration (numeric, in month) Purpose (text: car, furniture/equipment,
radio/TV, domestic appliances, repairs, education, business, vacation/others)
#Data Preparation
[8]: df.head(10)

[8]: Unnamed: 0 Age Sex Job Housing Saving accounts Checking account \
0 0 67 male 2 own NaN little
1 1 22 female 2 own little moderate
2 2 49 male 1 own little NaN
3 3 45 male 2 free little little
4 4 53 male 2 free little little
5 5 35 male 1 free NaN NaN
6 6 53 male 2 own quite rich NaN
7 7 35 male 3 rent little moderate
8 8 61 male 1 own rich NaN
9 9 28 male 3 own little moderate

Credit amount Duration Purpose

1
0 1169 6 radio/TV
1 5951 48 radio/TV
2 2096 12 education
3 7882 42 furniture/equipment
4 4870 24 car
5 9055 36 education
6 2835 24 furniture/equipment
7 6948 36 car
8 3059 12 radio/TV
9 5234 30 car

[9]: df.shape

[9]: (1000, 10)

[10]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 1000 non-null int64
1 Age 1000 non-null int64
2 Sex 1000 non-null object
3 Job 1000 non-null int64
4 Housing 1000 non-null object
5 Saving accounts 817 non-null object
6 Checking account 606 non-null object
7 Credit amount 1000 non-null int64
8 Duration 1000 non-null int64
9 Purpose 1000 non-null object
dtypes: int64(5), object(5)
memory usage: 78.2+ KB

[11]: df.describe(include='all')

[11]: Unnamed: 0 Age Sex Job Housing Saving accounts \

count 1000.000000 1000.000000 1000 1000.000000 1000 817
unique NaN NaN 2 NaN 3 4
top NaN NaN male NaN own little
freq NaN NaN 690 NaN 713 603
mean 499.500000 35.546000 NaN 1.904000 NaN NaN
std 288.819436 11.375469 NaN 0.653614 NaN NaN
min 0.000000 19.000000 NaN 0.000000 NaN NaN
25% 249.750000 27.000000 NaN 2.000000 NaN NaN
50% 499.500000 33.000000 NaN 2.000000 NaN NaN

2
75% 749.250000 42.000000 NaN 2.000000 NaN NaN
max 999.000000 75.000000 NaN 3.000000 NaN NaN

Checking account Credit amount Duration Purpose

count 606 1000.000000 1000.000000 1000
unique 3 NaN NaN 8
top little NaN NaN car
freq 274 NaN NaN 337
mean NaN 3271.258000 20.903000 NaN
std NaN 2822.736876 12.058814 NaN
min NaN 250.000000 4.000000 NaN
25% NaN 1365.500000 12.000000 NaN
50% NaN 2319.500000 18.000000 NaN
75% NaN 3972.250000 24.000000 NaN
max NaN 18424.000000 72.000000 NaN

#EDA
[12]: df.hist(figsize=(20, 20))
plt.show()

3
[13]: df = df.loc[:, ~df.columns.str.contains('^Unnamed')]

[14]: df.to_csv('german_credit_data.csv')

[15]: w=df['Credit amount'].max()

print(w)

18424

[16]: def determine_creditability(row):

# Example criteria (adjust these based on your requirements)
if row['Credit amount'] < 5000: # Replace with your threshold
return 1 # Creditworthy

4
else:
return 0 # Not creditworthy

# Apply the function to each row

df['Creditability'] = df.apply(determine_creditability, axis=1)

# Save the updated DataFrame

df.to_csv('updated_file.csv', index=False)

[17]: df1=pd.read_csv('updated_file.csv')
df.head(2)

[17]: Age Sex Job Housing Saving accounts Checking account Credit amount \
0 67 male 2 own NaN little 1169
1 22 female 2 own little moderate 5951

Duration Purpose Creditability

0 6 radio/TV 1
1 48 radio/TV 0

Here 1-credit worthy 2-not credit worthy

[18]: sns.set_style('darkgrid')
plt.ylabel("total no of credit defaulters")
dx=sns.countplot(x='Creditability', data=df1)
plt.show()

5
[19]: plt.figure(figsize=(20,10))
sns.countplot(x='Duration', hue='Creditability', data=df1)
plt.xlabel('Duration')
plt.ylabel('total no of credit defaulters')
plt.show()

6
[20]: gender_df=df1.groupby(["Sex", "Creditability"])["Purpose"].value_counts()
gender_df

[20]: Sex Creditability Purpose

female 0 car 21
radio/TV 8
furniture/equipment 7
business 4
education 3
vacation/others 2
1 radio/TV 77
car 73
furniture/equipment 67
education 21
business 15
domestic appliances 6
repairs 5
vacation/others 1
male 0 car 68
business 22
radio/TV 19
furniture/equipment 15
education 9
vacation/others 6
repairs 4
1 radio/TV 176
car 175
furniture/equipment 92

7
business 56
education 26
repairs 13
domestic appliances 6
vacation/others 3
Name: count, dtype: int64

[21]: plt.figure(figsize=(20,10))
ax=sns.countplot(x='Sex', hue='Job', data=df1)
plt.legend(title='Job',loc='upper right')
plt.show()

[22]: creditability_counts = df1.groupby(['Sex', 'Creditability']).size().

↪unstack(fill_value=0)

colors = ['#FF9999', '#66B3FF']

# Plot pie charts for each gender
for gender in creditability_counts.index:
plt.figure(figsize=(8, 8))
plt.pie(creditability_counts.loc[gender], labels=['Not Creditworthy (0)',␣
↪'Creditworthy (1)'], autopct='%1.1f%%', colors=colors)

plt.title(f'Creditability Distribution for {gender.capitalize()}')

plt.show()

8
9
[23]: plt.figure(figsize=(30,10))
plt.ylabel("total no of credit defaulters")
dx=sns.countplot(x='Purpose', hue='Creditability', data=df1)
plt.show()

10
1: Good customers 2: Bad customers
[24]: plt.figure(figsize=(15,8))
plt.ylabel("total no of credit defaulters")
dx=sns.countplot(x='Creditability', hue='Housing', data=df1)
plt.show()

Correlation
[25]: corr = df1[['Creditability', 'Credit amount']].corr()

# Create a heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(corr, annot=True, cmap='coolwarm', fmt='.2f', square=True)
plt.title('Correlation Heatmap')
plt.show()

11
converting categorical value into numerical values
[26]: df1['Sex'] = df1['Sex'].replace({'male': 1, 'female': 0})

[27]: df1['Housing'] = df1['Housing'].replace({'own': 0, 'free': 1, 'rent': 2})

[28]: options=['little' 'moderate', 'rich', 'quite rich']

nan_mask = df1['Saving accounts'].isna()
df1.loc[nan_mask, 'Saving accounts'] = np.random.choice(options, size=nan_mask.
↪sum())

[29]: nan_mask = df1['Checking account'].isna()

df1.loc[nan_mask, 'Checking account'] = np.random.choice(options, size=nan_mask.
↪sum())

[30]: df1['Saving accounts'] = df1['Saving accounts'].replace({'little': 0,␣

↪'moderate': 1, 'rich': 2, 'quite rich': 3})

12
df1['Checking account'] = df1['Checking account'].replace({'little': 0,␣
↪'moderate': 1, 'rich': 2, 'quite rich': 3})

[31]: df1['Saving accounts']=df1['Saving accounts'].replace({'littlemoderate':4})

df1['Checking account']=df1['Checking account'].replace({'littlemoderate':4})

[32]: df1.drop(['Purpose'],axis=1,inplace=True)

#preparation of datasets
[33]: Predictor = df1[df1.columns[df1.columns != 'Creditability']] # All columns␣
↪except 'Creditability'

Target = df1['Creditability']

Splitting data for training and testing

[34]: from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(Predictor, Target,␣
↪test_size = 0.3, random_state = 1)

[35]: print("X train",X_train.shape)

print("Y train",y_train.shape)
print("X test",X_test.shape)
print("y test",y_test.shape)

X train (700, 8)
Y train (700,)
X test (300, 8)
y test (300,)
#Model Training
[36]: from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix,␣
↪classification_report

[37]: logr=LogisticRegression()
logr.fit(X_train,y_train)
y_Pred=logr.predict(X_test)

[38]: y_Pred

[38]: array([0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1,
1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0,
1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1,
1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,

13
0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1,
1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1,
1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1,
1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1,
1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1])

[39]: conf_matrix=confusion_matrix(y_test,y_Pred)
conf_matrix

[39]: array([[ 56, 6],

[ 8, 230]])

**Confusion matrix structure

[40]: #([[True positive(TP), False Negative(FN)],
# [False Positive(FP), True Negavtive(TN)]

#True Positives (TP): This value should be high. It indicates the number of␣
↪actual positives that were correctly identified by the model.

#True Negatives (TN): This value should also be high. It indicates the number␣
↪of actual negatives that were correctly identified.

#False Positives (FP): This value should be low. It indicates the number of␣
↪actual negatives that were incorrectly classified as positives.

#False Negatives (FN): This value should be low. It indicates the number of␣
↪actual positives that were incorrectly classified as negatives.

[41]: sns.heatmap(pd.DataFrame(conf_matrix), annot=True, cmap="YlGnBu" ,fmt='g')

plt.title('Confusion matrix')
plt.ylabel('Actual label')
plt.xlabel('Predicted label')

[41]: Text(0.5, 23.52222222222222, 'Predicted label')

14
Accuracy = (TP+TN)/(TP+TN+FP+FN)

Precision = (TP)/(TP+FP)

Recall = (TP)/(TP+FN)

F1 score = 2 x (Precision x Recall)/(Precision + Recall)

[42]: print('Accuracy',accuracy_score(y_test,y_Pred))
print('Precision',logr.score(X_test,y_test))
print('Recall',logr.score(X_test,y_test))
print('F1 score',logr.score(X_test,y_test))

Accuracy 0.9533333333333334
Precision 0.9533333333333334
Recall 0.9533333333333334
F1 score 0.9533333333333334

15
[43]: print(classification_report(y_test,y_Pred))

precision recall f1-score support

0 0.88 0.90 0.89 62

1 0.97 0.97 0.97 238

accuracy 0.95 300

macro avg 0.92 0.93 0.93 300
weighted avg 0.95 0.95 0.95 300

[46]: from sklearn import metrics

[48]: y_pred_proba=logr.predict_proba(X_test)[:,1]
fpr, tpr, _ = metrics.roc_curve(y_test, y_pred_proba)
auc = metrics.roc_auc_score(y_test, y_pred_proba)

[51]: plt.plot(fpr,tpr,label="data 1, auc="+str(auc))

plt.legend(loc=4)
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.show()

16
[53]: from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score

[55]: kfold=KFold(n_splits=3, random_state=7, shuffle=True)

result = cross_val_score(logr, X_train, y_train, cv=kfold, scoring='accuracy')
print(result.mean())

0.9528447232309892
The Model has accuracy rate of 95%
[58]: df1.to_csv('preprocessed_data.csv', index=False)

Docslide - Us New Database1
No ratings yet
Docslide - Us New Database1
274 pages
IBM Power E1050 Level 2 Quiz
No ratings yet
IBM Power E1050 Level 2 Quiz
17 pages
Options Buying
No ratings yet
Options Buying
18 pages
Ogsd en PDF
No ratings yet
Ogsd en PDF
346 pages
Classification Problems
100% (1)
Classification Problems
25 pages
Module-2 - Logistic Regression in Machine Learning
No ratings yet
Module-2 - Logistic Regression in Machine Learning
28 pages
Criteria For Success of Osseointegrated Endosseous Implants Zarb
No ratings yet
Criteria For Success of Osseointegrated Endosseous Implants Zarb
6 pages
WIP Job Closure Process
No ratings yet
WIP Job Closure Process
13 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
Supervised Learning
100% (1)
Supervised Learning
15 pages
Logistic Regression
100% (1)
Logistic Regression
10 pages
22K61A0654 2 Sasi Auto
No ratings yet
22K61A0654 2 Sasi Auto
24 pages
Acknowledgement Thesis Sample Friends
100% (2)
Acknowledgement Thesis Sample Friends
5 pages
Classification
No ratings yet
Classification
3 pages
Samsung Gt-m5650 Lindy Service Manual
No ratings yet
Samsung Gt-m5650 Lindy Service Manual
79 pages
Ethical Issues in Research
No ratings yet
Ethical Issues in Research
5 pages
GM 34 Full Program With Questions 2024 FINAL
No ratings yet
GM 34 Full Program With Questions 2024 FINAL
7 pages
Hillstone HSM 4.0.0 EN
No ratings yet
Hillstone HSM 4.0.0 EN
2 pages
Germany Credit Analysis
No ratings yet
Germany Credit Analysis
41 pages
Mind Alarm Report
No ratings yet
Mind Alarm Report
47 pages
Instructions For Creating A 3D Laser Scanner Based On A CNC Machine
No ratings yet
Instructions For Creating A 3D Laser Scanner Based On A CNC Machine
33 pages
DA Programs
No ratings yet
DA Programs
44 pages
Human Resource Management: Decenzo and Robbins
No ratings yet
Human Resource Management: Decenzo and Robbins
17 pages
Sagara Technology Profile
No ratings yet
Sagara Technology Profile
39 pages
Bank Marketing Ingles
No ratings yet
Bank Marketing Ingles
37 pages
ML Manual Final
No ratings yet
ML Manual Final
35 pages
5 Logistic Regression Social NW
No ratings yet
5 Logistic Regression Social NW
5 pages
ML All Projectpdf Removed
No ratings yet
ML All Projectpdf Removed
41 pages
ADS - Phase 3
No ratings yet
ADS - Phase 3
34 pages
ML Batch
No ratings yet
ML Batch
36 pages
Bacdeaf 23032025 115708 Split 1
No ratings yet
Bacdeaf 23032025 115708 Split 1
37 pages
Machine File
No ratings yet
Machine File
27 pages
Ashwin Report
No ratings yet
Ashwin Report
18 pages
Performance Measures
No ratings yet
Performance Measures
25 pages
Datascience 2 PDF
No ratings yet
Datascience 2 PDF
24 pages
Building Logistic Regression Model in Python
No ratings yet
Building Logistic Regression Model in Python
24 pages
07 Logistics Regression
No ratings yet
07 Logistics Regression
23 pages
Da Lab Mannual
No ratings yet
Da Lab Mannual
25 pages
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
No ratings yet
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
20 pages
Logistic - Regresssion
No ratings yet
Logistic - Regresssion
22 pages
Distributed Systems Lab 10
No ratings yet
Distributed Systems Lab 10
24 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
22 pages
05 Logistic - Regression
No ratings yet
05 Logistic - Regression
7 pages
ML File
No ratings yet
ML File
17 pages
Train
No ratings yet
Train
17 pages
Openlab 1
No ratings yet
Openlab 1
17 pages
Logistic Regression
No ratings yet
Logistic Regression
21 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
Numerical Similarity Measures Versus Jaccard For Collaborative Filtering
No ratings yet
Numerical Similarity Measures Versus Jaccard For Collaborative Filtering
14 pages
98-Article Text-241-1-10-20210721
No ratings yet
98-Article Text-241-1-10-20210721
14 pages
StarterNotebook - Jupyter Notebook
No ratings yet
StarterNotebook - Jupyter Notebook
12 pages
St. John College of Engineering and Management, Palghar - Maharashtra
No ratings yet
St. John College of Engineering and Management, Palghar - Maharashtra
11 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
Multiple Disease Anlysis Website
No ratings yet
Multiple Disease Anlysis Website
12 pages
List of Guitar Manufacturers - Wikipedia
No ratings yet
List of Guitar Manufacturers - Wikipedia
11 pages
Group Work Assignment Supervised and Unsupervised Learning
No ratings yet
Group Work Assignment Supervised and Unsupervised Learning
10 pages
LDA CreditCardDefault Code N
No ratings yet
LDA CreditCardDefault Code N
11 pages
ML File - 1
No ratings yet
ML File - 1
12 pages
Marketing ABM11 Module1 WEEK3.4
No ratings yet
Marketing ABM11 Module1 WEEK3.4
9 pages
Binary Logistic Regression From Scratch
No ratings yet
Binary Logistic Regression From Scratch
10 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
Medmnist V2 - A Large-Scale Lightweight Benchmark For 2D and 3D Biomedical Image Classification
No ratings yet
Medmnist V2 - A Large-Scale Lightweight Benchmark For 2D and 3D Biomedical Image Classification
10 pages
POMMMQ
No ratings yet
POMMMQ
7 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
9 pages
Book Tool: Kickoff Meeting Template
No ratings yet
Book Tool: Kickoff Meeting Template
7 pages
Srushti ML Assign1
No ratings yet
Srushti ML Assign1
9 pages
PP-SFC Introduction To Production Orders
No ratings yet
PP-SFC Introduction To Production Orders
12 pages
Aiml Ex 4-7
No ratings yet
Aiml Ex 4-7
8 pages
S6 - Data Mining Lab Experiments (Except 1)
No ratings yet
S6 - Data Mining Lab Experiments (Except 1)
6 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
LOan Final
No ratings yet
LOan Final
6 pages
Random Forest
No ratings yet
Random Forest
8 pages
Import As Import As Import As: "Default - CSV"
No ratings yet
Import As Import As Import As: "Default - CSV"
9 pages
Detect Fake Profiles in Online Social Networks Using Support Vector Machine
No ratings yet
Detect Fake Profiles in Online Social Networks Using Support Vector Machine
8 pages
Dsbda 5
No ratings yet
Dsbda 5
4 pages
Neural Network
No ratings yet
Neural Network
7 pages
CCD - Ipynb - Colab
No ratings yet
CCD - Ipynb - Colab
6 pages
DT RF
No ratings yet
DT RF
7 pages
Pre Board I SST
No ratings yet
Pre Board I SST
4 pages
Python Code For Loan Default Prediction
No ratings yet
Python Code For Loan Default Prediction
4 pages
Datascience PR 6 Veda
No ratings yet
Datascience PR 6 Veda
6 pages
DSBDA5 - Jupyter Notebook
No ratings yet
DSBDA5 - Jupyter Notebook
4 pages
Payal Practical5 Edited
No ratings yet
Payal Practical5 Edited
5 pages
DSBDA05
No ratings yet
DSBDA05
5 pages
FB 12 STC 045 en 03 - Epoca Raso NHL - RNHL 105 - Eng
No ratings yet
FB 12 STC 045 en 03 - Epoca Raso NHL - RNHL 105 - Eng
3 pages
Bone Fracture Classification Using Transfer Learning: Shyam Gupta Dhanisha Sharma
No ratings yet
Bone Fracture Classification Using Transfer Learning: Shyam Gupta Dhanisha Sharma
4 pages
Data Analysis in Python-3
No ratings yet
Data Analysis in Python-3
4 pages
Marketing Mix Mcdonald
No ratings yet
Marketing Mix Mcdonald
3 pages
Sample DLP 2024
No ratings yet
Sample DLP 2024
3 pages
Ai Code
No ratings yet
Ai Code
2 pages
Mobile1 PDF
No ratings yet
Mobile1 PDF
2 pages
Cab 2024
No ratings yet
Cab 2024
1 page
RCC CE Capstone-Outline
No ratings yet
RCC CE Capstone-Outline
1 page
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet

Credit - Defaulters - Prediction Using Logostic Regression

Uploaded by

Credit - Defaulters - Prediction Using Logostic Regression

Uploaded by

diction-using-logostic-regression

September 22, 2024

[6]: import seaborn as sns

The selected attributes are:

Credit amount Duration Purpose

[9]: (1000, 10)

[11]: Unnamed: 0 Age Sex Job Housing Saving accounts \

Checking account Credit amount Duration Purpose

[15]: w=df['Credit amount'].max()

[16]: def determine_creditability(row):

# Apply the function to each row

# Save the updated DataFrame

Duration Purpose Creditability

Here 1-credit worthy 2-not credit worthy

[20]: Sex Creditability Purpose

[22]: creditability_counts = df1.groupby(['Sex', 'Creditability']).size().

colors = ['#FF9999', '#66B3FF']

plt.title(f'Creditability Distribution for {gender.capitalize()}')

[27]: df1['Housing'] = df1['Housing'].replace({'own': 0, 'free': 1, 'rent': 2})

[28]: options=['little' 'moderate', 'rich', 'quite rich']

[29]: nan_mask = df1['Checking account'].isna()

[30]: df1['Saving accounts'] = df1['Saving accounts'].replace({'little': 0,␣

[31]: df1['Saving accounts']=df1['Saving accounts'].replace({'littlemoderate':4})

Splitting data for training and testing

[35]: print("X train",X_train.shape)

[39]: array([[ 56, 6],

**Confusion matrix structure

[41]: sns.heatmap(pd.DataFrame(conf_matrix), annot=True, cmap="YlGnBu" ,fmt='g')

[41]: Text(0.5, 23.52222222222222, 'Predicted label')

F1 score = 2 x (Precision x Recall)/(Precision + Recall)

precision recall f1-score support

0 0.88 0.90 0.89 62

accuracy 0.95 300

[46]: from sklearn import metrics

[51]: plt.plot(fpr,tpr,label="data 1, auc="+str(auc))

[55]: kfold=KFold(n_splits=3, random_state=7, shuffle=True)

You might also like