Kunal Assignment 3
Kunal Assignment 3
self-
4517 57 married tertiary yes -3313 yes
employed
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4521 entries, 0 to 4520
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 age 4521 non-null int64
1 job 4521 non-null object
2 marital 4521 non-null object
3 education 4521 non-null object
4 default 4521 non-null object
5 balance 4521 non-null int64
6 housing-loan 4521 non-null object
7 personal-loan 4521 non-null object
8 current-campaign 4521 non-null int64
9 previous-campaign 4521 non-null int64
10 subscribed 4521 non-null object
dtypes: int64(4), object(7)
memory usage: 388.7+ KB
Out[7]: 3
(4518, 11)
subscribed
count 4518
unique 2
top no
freq 3997
df_cleaned['subscribed'].value_counts()
Out[13]: subscribed
no 3997
yes 521
Name: count, dtype: int64
plt.figure(figsize=(8,4))
sns.boxplot(data=df_cleaned, x='subscribed', hue='subscribed', y='balance',
plt.title("Balance vs Subscribed")
plt.show()
4. Consider a subset of ‘bank’ data with variables as
‘age’, ‘marital’, ‘education’, default’, ‘balance’,
‘housing-loan’, ‘personal-loan’, and ‘subscribed’. Name
this new data as bank_new
In [15]: # Creating subset as per assignment instructions
bank_new = df_cleaned[['age', 'marital', 'education', 'default', 'balance',
<class 'pandas.core.frame.DataFrame'>
Index: 4518 entries, 0 to 4520
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 age 4518 non-null int64
1 marital 4518 non-null object
2 education 4518 non-null object
3 default 4518 non-null object
4 balance 4518 non-null int64
5 housing-loan 4518 non-null object
6 personal-loan 4518 non-null object
7 subscribed 4518 non-null object
dtypes: int64(2), object(6)
memory usage: 317.7+ KB
Categorical columns:
['marital', 'education', 'default', 'housing-loan', 'personal-loan', 'subsc
ribed']
Continuous columns:
['age', 'balance']
Out[20]:
age balance subscribed marital_married marital_single education_se
4 59 0 0 True False
In [21]: b_cc=bank_encoded.corr()
In [22]: # Split into input features (X) and target variable (y)
X = bank_encoded.drop('subscribed', axis=1)
y = bank_encoded['subscribed']
# Split the data into training and testing sets (80-20 split, stratified)
X_train, X_test, y_train, y_test = train_test_split(
X, y,
test_size=0.2,
random_state=42,
stratify=y # Preserves class distribution in both sets
)
✅
In [24]: # Checking the shape of the resulting splits
print(f" Train-Test Split Complete:\nTraining Samples: {X_train.shape[0]}
Out[27]: ▾ LogisticRegression
LogisticRegression(class_weight='balanced', random_state=42)
Out[33]:
age balance marital_married marital_single education_secondary ed
• Accuracy
• Precision
• Recall
• Sensitivity
• Specificity
• F1 score
• AUC (Area under ROC curve)
# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
tn, fp, fn, tp = cm.ravel()
# Metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, zero_division=0)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
auc = roc_auc_score(y_test, y_proba)
📈
# Print metrics
print(" Model Evaluation Metrics")
print(f"Accuracy : {accuracy:.4f}")
print(f"Precision : {precision:.4f}")
print(f"Recall : {recall:.4f}")
print(f"Sensitivity : {sensitivity:.4f}")
print(f"Specificity : {specificity:.4f}")
print(f"F1 Score : {f1:.4f}")
print(f"AUC Score : {auc:.4f}")
# Metrics
accuracy = accuracy_score(y_test, y_pred_threshold)
precision = precision_score(y_test, y_pred_threshold, zero_division=0)
recall = recall_score(y_test, y_pred_threshold)
f1 = f1_score(y_test, y_pred_threshold)
auc = roc_auc_score(y_test, y_proba)
📈
# Print metrics
print(" Model Evaluation Metrics")
print(f"Accuracy : {accuracy:.4f}")
print(f"Precision : {precision:.4f}")
print(f"Recall : {recall:.4f}")
print(f"Sensitivity : {sensitivity:.4f}")
print(f"Specificity : {specificity:.4f}")
print(f"F1 Score : {f1:.4f}")
print(f"AUC Score : {auc:.4f}")
# Plot setup
plt.figure(figsize=(14, 8))
# Final touches
plt.title('Model Evaluation Metrics at Different Thresholds', fontsize=16)
plt.xlabel('Threshold', fontsize=12)
plt.ylabel('Score', fontsize=12)
plt.xticks(thresholds)
plt.ylim(0, 1.05)
plt.grid(True, linestyle='--', alpha=0.6)
plt.legend(loc='lower center', bbox_to_anchor=(0.5, -0.25), ncol=3)
plt.tight_layout()
plt.show()
📈
# Print metrics
print(" Model Evaluation Metrics")
print(f"Accuracy : {accuracy:.4f}")
print(f"Precision : {precision:.4f}")
print(f"Recall : {recall:.4f}")
print(f"Sensitivity : {sensitivity:.4f}")
print(f"Specificity : {specificity:.4f}")
print(f"F1 Score : {f1:.4f}")
print(f"AUC Score : {auc:.4f}")